Process Source
Reprocess uploaded documents with different parsing methods via the GraphorLM REST API
The Process Source endpoint allows you to reprocess previously uploaded documents using different parsing and classification methods. This enables you to optimize document processing for better text extraction, structure recognition, and retrieval performance without re-uploading the file.
Endpoint Overview
HTTP Method
POST
Endpoint URL
Authentication
This endpoint requires authentication using an API token. You must include your API token as a Bearer token in the Authorization header.
Learn how to create and manage API tokens in the API Tokens guide.
Request Format
Headers
Header | Value | Required |
---|---|---|
Authorization | Bearer YOUR_API_TOKEN | ✅ Yes |
Content-Type | application/json | ✅ Yes |
Request Body
The request must be sent as JSON with the following fields:
Field | Type | Description | Required |
---|---|---|---|
file_name | string | Name of the previously uploaded file to reprocess | ✅ Yes |
partition_method | string | Processing method to use (see available methods below) | ✅ Yes |
Available Processing Methods
Basic
Basic
Best for: Simple text documents, quick processing
- Fast processing with heuristic classification
- No OCR processing
- Suitable for plain text files and well-structured documents
- Recommended for testing and development
OCR
OCR
Best for: Scanned documents, images with text
- Utilizes OCR for text extraction and parsing
- Heuristic-based document element classification
- Ideal for scanned PDFs and image files
- Balances processing speed and accuracy
YOLOX
YOLOX
Best for: Complex documents with varied layouts
- OCR-based text extraction
- AI-powered document structure classification using YOLOX model
- Better recognition of tables, figures, and document elements
- Enhanced accuracy for complex layouts
Advanced
Advanced
Best for: Premium accuracy, specialized documents
- OCR-based text extraction
- Fine-tuned AI model for document classification
- Highest accuracy for document structure recognition
- Optimized for specialized and complex document types
- Note: Premium feature
GraphorLM
GraphorLM
Best for: Custom processing workflows
- Specialized processing method
- Custom document analysis pipeline
- Advanced document understanding capabilities
Request Example
Response Format
Success Response (200 OK)
Response Fields
Field | Type | Description |
---|---|---|
status | string | Processing result (typically “success”) |
message | string | Human-readable success message |
file_name | string | Name of the processed file |
file_size | integer | Size of the file in bytes |
file_type | string | File extension/type |
file_source | string | Source type of the original file |
project_id | string | UUID of the project containing the file |
project_name | string | Name of the project |
partition_method | string | Processing method that was applied |
Code Examples
JavaScript/Node.js
Python
cURL
PHP
Error Responses
Common Error Codes
Status Code | Error Type | Description |
---|---|---|
400 | Bad Request | Invalid request format or missing required fields |
401 | Unauthorized | Invalid or missing API token |
403 | Forbidden | Access denied to the specified project |
404 | Not Found | File not found in the project |
500 | Internal Server Error | Processing failure or server error |
Error Response Format
Error Examples
File Not Found (404)
File Not Found (404)
Cause: The specified file name doesn’t exist in your project
Solution: Verify the file name and ensure it was previously uploaded
Invalid API Token (401)
Invalid API Token (401)
Cause: API token is invalid, expired, or malformed
Solution: Check your API token and ensure it hasn’t been revoked
Processing Failed (500)
Processing Failed (500)
Cause: Internal processing error with the specified method
Solution: Try a different processing method or check file integrity
Invalid Method (400)
Invalid Method (400)
Cause: Unsupported or invalid partition method
Solution: Use one of: basic, ocr, yolox, advanced, graphorlm
Processing Method Selection Guide
Method Comparison
Method | Speed | Accuracy | Best Use Cases | OCR | AI Classification |
---|---|---|---|---|---|
Basic | ⚡⚡⚡ | ⭐⭐ | Simple text files, testing | ❌ | ❌ |
OCR | ⚡⚡ | ⭐⭐⭐ | Scanned documents, images | ✅ | ❌ |
YOLOX | ⚡ | ⭐⭐⭐⭐ | Complex layouts, mixed content | ✅ | ✅ |
Advanced | ⚡ | ⭐⭐⭐⭐⭐ | Premium accuracy needed | ✅ | ✅ Premium |
GraphorLM | ⚡ | ⭐⭐⭐⭐ | Custom workflows | ✅ | ✅ Custom |
When to Reprocess
Poor text extraction
Poor text extraction
Symptoms: Missing text, garbled characters, incomplete content
Recommended methods:
- OCR for scanned documents
- YOLOX or Advanced for complex layouts
Table detection issues
Table detection issues
Symptoms: Tables not properly recognized, merged cells, structure lost
Recommended methods:
- YOLOX for better table detection
- Advanced for complex table structures
Image and figure handling
Image and figure handling
Symptoms: Missing captions, poor figure recognition
Recommended methods:
- YOLOX for figure detection
- Advanced for comprehensive image analysis
Document structure problems
Document structure problems
Symptoms: Headers/footers mixed with content, poor section detection
Recommended methods:
- YOLOX for structure recognition
- Advanced for complex document hierarchies
Best Practices
Processing Strategy
- Start with Basic: For testing and simple documents
- Upgrade gradually: Move to OCR → YOLOX → Advanced based on needs
- Monitor results: Use document preview to evaluate processing quality
- Consider cost vs. quality: Advanced methods take longer but provide better results
Performance Optimization
- Batch processing: Process multiple files sequentially rather than simultaneously
- Method selection: Choose the appropriate method for your document types
- Timeout handling: Allow sufficient time for complex processing methods
- Error recovery: Implement retry logic for transient failures
Quality Assessment
After processing, evaluate the results by:
- Checking text extraction completeness
- Verifying table and figure recognition
- Reviewing document structure classification
- Testing retrieval quality in your RAG pipeline
Integration Examples
Automatic Quality Improvement
Batch Reprocessing
Processing with Progress Tracking
Troubleshooting
Processing timeouts
Processing timeouts
Causes: Large files, complex documents, or heavy server load
Solutions:
- Increase request timeout (5+ minutes recommended)
- Try a simpler processing method first
- Process during off-peak hours
- Contact support for very large documents
File not found errors
File not found errors
Causes: Incorrect file name, file deleted, or wrong project
Solutions:
- Verify exact file name (case-sensitive)
- Use the List Sources endpoint to check available files
- Ensure you’re using the correct API token for the project
Processing failures
Processing failures
Causes: Corrupted files, unsupported content, or method incompatibility
Solutions:
- Try a different processing method
- Check file integrity
- Re-upload the file if necessary
- Contact support for persistent issues
Poor processing quality
Poor processing quality
Causes: Method not suitable for document type, or complex layout
Solutions:
- Upgrade to YOLOX or Advanced method
- Ensure document quality is good
- Consider pre-processing the document
- Review processing results in the dashboard
Next Steps
After successfully processing your documents: