Endpoint Overview
HTTP Method
POST
Endpoint URL
Authentication
This endpoint requires authentication using an API token. You must include your API token as a Bearer token in the Authorization header.Learn how to create and manage API tokens in the API Tokens guide.
Request Format
Headers
Header | Value | Required |
---|---|---|
Authorization | Bearer YOUR_API_TOKEN | ✅ Yes |
Content-Type | application/json | ✅ Yes |
Request Body
The request must be sent as JSON with the following fields:Field | Type | Description | Required |
---|---|---|---|
file_name | string | Name of the previously uploaded file to reprocess | ✅ Yes |
partition_method | string | Processing method to use (see available methods below) | ✅ Yes |
Available Processing Methods
Basic
Basic
Best for: Simple text documents, quick processing
- Fast processing with heuristic classification
- No OCR processing
- Suitable for plain text files and well-structured documents
- Recommended for testing and development
OCR Only
OCR Only
Best for: Scanned documents, images with text
- Utilizes OCR for text extraction and parsing
- Heuristic-based document element classification
- Ideal for scanned PDFs and image files
- Balances processing speed and accuracy
Hi-Res
Hi-Res
Best for: Complex documents with varied layouts
- OCR-based text extraction
- AI-powered document structure classification using Hi-Res model
- Better recognition of tables, figures, and document elements
- Enhanced accuracy for complex layouts
Hi-Res (fine-tuned)
Hi-Res (fine-tuned)
Best for: Premium accuracy, specialized documents
- OCR-based text extraction
- Fine-tuned AI model for document classification
- Highest accuracy for document structure recognition
- Optimized for specialized and complex document types
- Note: Premium feature
GraphorLM
GraphorLM
Best for: Custom processing workflows
- Specialized processing method
- Custom document analysis pipeline
- Advanced document understanding capabilities
MAI
MAI
Best for: Fast text-focused processing without layout metadata
- Model-assisted partitioning focused on textual content
- Does not output bounding boxes or page layout (no bbox)
- Lightweight and faster when you only need clean text and element types
- Performs page annotation (page-level labels and context)
- Performs document annotation (document-level labels and summaries)
- Performs image annotation when images are present in the document
- Best-in-class text parsing quality; element classification is limited
partition_method
values
Use these values for the partition_method
field when calling the endpoint:
Method | partition_method |
---|---|
Basic | basic |
OCR Only | ocr |
Hi-Res | yolox |
Hi-Res (fine-tuned) | advanced |
GraphorLM | graphorlm |
MAI | mai |
Processing Method Selection Guide
Method Comparison
Method | Speed | Text Parsing | Element Classification | Bounding Boxes | Best Use Cases | OCR |
---|---|---|---|---|---|---|
Basic | ⚡⚡⚡ | ⭐⭐ | ⭐⭐ | ✅ (limited) | Simple text files, testing | ❌ |
OCR Only | ⚡⚡ | ⭐⭐⭐ | ⭐⭐ | ✅ (images) | Scanned documents, images | ✅ |
Hi-Res | ⚡ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ✅ | Complex layouts, mixed content | ✅ |
Hi-Res (fine-tuned) | ⚡ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ✅ | Premium accuracy needed | ✅ |
GraphorLM | ⚡ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ✅ | Custom workflows | ✅ |
MAI | ⚡⚡⚡ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ❌ | Text precision without layout metadata | ✅ |
Request Example
Processing can take several minutes depending on document size, complexity, and the selected processing method. Advanced methods like Hi-Res, Hi-Res (fine-tuned), GraphorLM and MAI typically require more time for analysis.
Response Format
Success Response (200 OK)
Response Fields
Field | Type | Description |
---|---|---|
status | string | Processing result (typically “success”) |
message | string | Human-readable success message |
file_name | string | Name of the processed file |
file_size | integer | Size of the file in bytes |
file_type | string | File extension/type |
file_source | string | Source type of the original file |
project_id | string | UUID of the project containing the file |
project_name | string | Name of the project |
partition_method | string | Processing method that was applied |
Code Examples
JavaScript/Node.js
Python
cURL
PHP
Error Responses
Common Error Codes
Status Code | Error Type | Description |
---|---|---|
400 | Bad Request | Invalid request format or missing required fields |
401 | Unauthorized | Invalid or missing API token |
403 | Forbidden | Access denied to the specified project |
404 | Not Found | File not found in the project |
500 | Internal Server Error | Processing failure or server error |
Error Response Format
Error Examples
File Not Found (404)
File Not Found (404)
Invalid API Token (401)
Invalid API Token (401)
Processing Failed (500)
Processing Failed (500)
Invalid Method (400)
Invalid Method (400)
When to Reprocess
Poor text extraction
Poor text extraction
Symptoms: Missing text, garbled characters, incomplete contentRecommended methods:
- OCR for scanned documents
- Hi-Res or Hi-Res (fine-tuned) for complex layouts
- MAI for text-only documents when bounding boxes are not required
Table detection issues
Table detection issues
Symptoms: Tables not properly recognized, merged cells, structure lostRecommended methods:
- Hi-Res for better table detection
- Hi-Res (fine-tuned) for complex table structures
Image and figure handling
Image and figure handling
Symptoms: Missing captions, poor figure recognitionRecommended methods:
- Hi-Res for figure detection
- Hi-Res (fine-tuned) for comprehensive image analysis
Document structure problems
Document structure problems
Symptoms: Headers/footers mixed with content, poor section detectionRecommended methods:
- Hi-Res for structure recognition
- Hi-Res (fine-tuned) for complex document hierarchies
- GraphorLM for enhanced semantic structure and relationships
Best Practices
Processing Strategy
- Start with Basic: For testing and simple documents
- Upgrade gradually: Move to OCR → Hi-Res → Hi-Res (fine-tuned) -> GraphorLM -> MAI based on needs
- Monitor results: Use document preview to evaluate processing quality
- Consider efficiency vs. quality: Advanced methods take longer but provide better results
Performance Optimization
- Batch processing: Process multiple files sequentially rather than simultaneously
- Method selection: Choose the appropriate method for your document types
- Timeout handling: Allow sufficient time for complex processing methods
- Error recovery: Implement retry logic for transient failures
Quality Assessment
After processing, evaluate the results by:- Checking text extraction completeness
- Verifying table and figure recognition
- Reviewing document structure classification
- Testing retrieval quality in your RAG pipeline
Integration Examples
Automatic Quality Improvement
Batch Reprocessing
Processing with Progress Tracking
Troubleshooting
Processing timeouts
Processing timeouts
Causes: Large files, complex documents, or heavy server loadSolutions:
- Increase request timeout (5+ minutes recommended)
- Try a simpler processing method first
- Process during off-peak hours
- Contact support for very large documents
File not found errors
File not found errors
Causes: Incorrect file name, file deleted, or wrong projectSolutions:
- Verify exact file name (case-sensitive)
- Use the List Sources endpoint to check available files
- Ensure you’re using the correct API token for the project
Processing failures
Processing failures
Causes: Corrupted files, unsupported content, or method incompatibilitySolutions:
- Try a different processing method
- Check file integrity
- Re-upload the file if necessary
- Contact support for persistent issues
Poor processing quality
Poor processing quality
Causes: Method not suitable for document type, or complex layoutSolutions:
- Upgrade to Hi-Res or Hi-Res (fine-tuned) method
- Ensure document quality is good
- Consider pre-processing the document
- Review processing results in the dashboard