parse method allows you to reprocess previously uploaded documents using different parsing and classification methods. This enables you to optimize document processing for better text extraction, structure recognition, and retrieval performance without re-uploading the file.
Method Overview
Sync Method
client.sources.parse()Async Method
await client.sources.parse()Method Signature
Parameters
| Parameter | Type | Description | Required |
|---|---|---|---|
file_id | str | Unique identifier for the source (preferred) | No* |
file_name | str | Name of the previously uploaded file to reprocess (deprecated, use file_id) | No* |
partition_method | PartitionMethod | Processing method to use (see available methods below) | No |
timeout | float | Request timeout in seconds | No |
*At least one of
file_id or file_name must be provided. file_id is preferred.Available Processing Methods
ThePartitionMethod type accepts the following literal values:
Fast (basic)
Fast (basic)
Value:
"basic"Best for: Simple text documents, quick processing- Fast processing with heuristic classification
- No OCR processing
- Suitable for plain text files and well-structured documents
- Recommended for testing and development
Balanced (hi_res)
Balanced (hi_res)
Value:
"hi_res"Best for: Complex documents with varied layouts- OCR-based text extraction
- AI-powered document structure classification using Hi-Res model
- Better recognition of tables, figures, and document elements
- Enhanced accuracy for complex layouts
Accurate (hi_res_ft)
Accurate (hi_res_ft)
Value:
"hi_res_ft"Best for: Premium accuracy, specialized documents- OCR-based text extraction
- Fine-tuned AI model for document classification
- Highest accuracy for document structure recognition
- Optimized for specialized and complex document types
- Note: Premium feature
VLM (mai)
VLM (mai)
Value:
"mai"Best for: Text-first parsing, manuscripts, and handwritten documents- Our best text-first parsing with high-quality output
- Does not output bounding boxes or page layout (no bbox)
- Best for MANUSCRIPT and HANDWRITTEN documents
- Performs page annotation (page-level labels and context)
- Performs document annotation (document-level labels and summaries)
- Performs image annotation when images are present in the document
- Best-in-class text parsing quality; element classification is limited
Agentic (graphorlm)
Agentic (graphorlm)
Value:
"graphorlm"Best for: Complex layouts, multi-page tables, diagrams, and images- Our highest parsing setting for complex layouts
- Rich annotations for images and complex elements
- Uses agentic processing for enhanced understanding
- Advanced document understanding capabilities
Method Reference
| Method | partition_method Value |
|---|---|
| Fast | "basic" |
| Balanced | "hi_res" |
| Accurate | "hi_res_ft" |
| VLM | "mai" |
| Agentic | "graphorlm" |
Processing Method Comparison
| Method | Speed | Text Parsing | Element Classification | Bounding Boxes | Best Use Cases | OCR |
|---|---|---|---|---|---|---|
| Fast | ⚡⚡⚡ | ⭐⭐ | ⭐⭐ | ✅ (limited) | Simple text files, testing | ❌ |
| Balanced | ⚡ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ✅ | Complex layouts, mixed content | ✅ |
| Accurate | ⚡ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ✅ | Premium accuracy needed | ✅ |
| VLM | ⚡⚡⚡ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ❌ | Manuscripts, handwritten documents | ✅ |
| Agentic | ⚡ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ✅ | Complex layouts, multi-page tables, diagrams | ✅ |
Response Object
The method returns aPublicSource object with the following properties:
| Property | Type | Description |
|---|---|---|
status | str | Processing result (typically “success”) |
message | str | Human-readable success message |
file_name | str | Name of the processed file |
file_size | int | Size of the file in bytes |
file_type | str | File extension/type |
file_source | str | Source type of the original file |
project_id | str | UUID of the project containing the file |
project_name | str | Name of the project |
partition_method | str | None | Processing method that was applied |
Code Examples
Basic Usage
Using Different Methods
Async Usage
With Extended Timeout
Processing complex documents can take several minutes. Configure appropriate timeouts:Error Handling
Advanced Examples
Automatic Quality Improvement
Progressively try more advanced processing methods until quality is satisfactory:Batch Reprocessing
Reprocess multiple files with the same method:Async Batch Processing
Process multiple files concurrently for better performance:Processing with Progress Tracking
When to Reprocess
Poor text extraction
Poor text extraction
Symptoms: Missing text, garbled characters, incomplete contentRecommended methods:
"hi_res"or"hi_res_ft"for complex layouts"mai"for text-only documents when bounding boxes are not required
Table detection issues
Table detection issues
Symptoms: Tables not properly recognized, merged cells, structure lostRecommended methods:
"hi_res"for better table detection"hi_res_ft"for complex table structures"graphorlm"for multi-page tables
Image and figure handling
Image and figure handling
Symptoms: Missing captions, poor figure recognitionRecommended methods:
"hi_res"for figure detection"hi_res_ft"for comprehensive image analysis"graphorlm"for rich image annotations
Document structure problems
Document structure problems
Symptoms: Headers/footers mixed with content, poor section detectionRecommended methods:
"hi_res"for structure recognition"hi_res_ft"for complex document hierarchies"graphorlm"for enhanced semantic structure and relationships
Best Practices
Processing Strategy
- Start with Fast (
"basic"): For testing and simple documents - Upgrade gradually: Move to
"hi_res"→"hi_res_ft"→"mai"→"graphorlm"based on needs - Monitor results: Use document preview to evaluate processing quality
- Consider efficiency vs. quality: Advanced methods take longer but provide better results
Performance Optimization
- Batch processing: Process multiple files sequentially rather than simultaneously
- Method selection: Choose the appropriate method for your document types
- Timeout handling: Allow sufficient time for complex processing methods (5+ minutes)
- Error recovery: Implement retry logic for transient failures
Quality Assessment
After processing, evaluate the results by:- Checking text extraction completeness
- Verifying table and figure recognition
- Reviewing document structure classification
- Testing retrieval quality in your RAG pipeline
Error Reference
| Error Type | Status Code | Description |
|---|---|---|
BadRequestError | 400 | Invalid request format or partition method |
AuthenticationError | 401 | Invalid or missing API key |
PermissionDeniedError | 403 | Access denied to the specified project |
NotFoundError | 404 | File not found in the project |
RateLimitError | 429 | Too many requests, please retry after waiting |
InternalServerError | ≥500 | Processing failure or server error |
APIConnectionError | N/A | Network connectivity issues |
APITimeoutError | N/A | Request timed out |
Troubleshooting
Processing timeouts
Processing timeouts
Causes: Large files, complex documents, or heavy server loadSolutions:
- Increase request timeout (5+ minutes recommended)
- Try a simpler processing method first
- Process during off-peak hours
File not found errors
File not found errors
Causes: Incorrect file name, file deleted, or wrong projectSolutions:
- Verify exact file name (case-sensitive)
- Use
client.sources.list()to check available files - Ensure you’re using the correct API key for the project
Processing failures
Processing failures
Causes: Corrupted files, unsupported content, or method incompatibilitySolutions:
- Try a different processing method
- Check file integrity
- Re-upload the file if necessary using
client.sources.upload()
Poor processing quality
Poor processing quality
Causes: Method not suitable for document type, or complex layoutSolutions:
- Upgrade to
"hi_res"or"hi_res_ft"method - Use
"mai"for manuscripts and handwritten documents - Use
"graphorlm"for complex layouts with tables and diagrams - Ensure document quality is good

