reprocess method (same name as the API endpoint) re-runs the ingestion pipeline on an existing source using a different partition method. Processing is asynchronous: the method returns a build_id immediately; poll Get build status until the job completes.
Method overview
- Python
- TypeScript
Sync
client.sources.reprocess()Async
await client.sources.reprocess()Method signature
- Python
- TypeScript
SourceReprocessResponse with .build_id.Parameters
- Python
- TypeScript
| Parameter | Type | Description | Required |
|---|---|---|---|
file_id | str | Unique identifier of the source to re-process | Yes |
method | str | None | One of: fast, balanced, accurate, vlm, agentic. Default: fast | No |
timeout | float | Request timeout in seconds | No |
Partition method values (v2)
| Value | Name | Description |
|---|---|---|
fast | Fast | Fast processing with heuristic classification. No OCR. |
balanced | Balanced | OCR-based extraction with structure classification. |
accurate | Accurate | Fine-tuned model for highest accuracy (Premium). |
vlm | VLM | Best for manuscripts and handwritten content. |
agentic | Agentic | Highest accuracy for complex layouts, tables, and diagrams. |
Method comparison
| Method | Speed | Text parsing | Element classification | Best use cases | OCR |
|---|---|---|---|---|---|
| Fast | High | Good | Good | Simple text files, testing | No |
| Balanced | Medium | Very good | Very good | Complex layouts, mixed content | Yes |
| Accurate | Medium | Excellent | Excellent | Premium accuracy needed | Yes |
| VLM | High | Excellent | Good | Manuscripts, handwritten | Yes |
| Agentic | Medium | Excellent | Excellent | Complex layouts, multi-page tables, diagrams | Yes |
Return value
The method returns abuild_id (string). Use it with Get build status to poll until processing completes (Completed or failure). The file_id does not change.
Code examples
Basic usage
- Python
- TypeScript
Reprocess and poll until complete
- Python
- TypeScript
With partition method
- Python
- TypeScript
Error handling
- Python
- TypeScript
Batch reprocess
Reprocess multiple sources byfile_id; each call returns a build_id. Poll Get build status for each until complete.
- Python
- TypeScript
When to reprocess
Poor text extraction
Poor text extraction
Symptoms: Missing text, garbled characters, incomplete content
Recommended:
Recommended:
balanced or accurate for complex layouts; vlm for text-only when bounding boxes are not needed.Table detection issues
Table detection issues
Symptoms: Tables not recognized, merged cells, structure lost
Recommended:
Recommended:
balanced, accurate, or agentic for multi-page tables.Image and figure handling
Image and figure handling
Symptoms: Missing captions, poor figure recognition
Recommended:
Recommended:
balanced, accurate, or agentic for rich image annotations.Document structure problems
Document structure problems
Symptoms: Headers/footers mixed with content, poor section detection
Recommended:
Recommended:
balanced, accurate, or agentic for better structure and semantics.Best practices
- Use
file_id: Always use the source’sfile_id(from list sources or build status). - Poll build status: After calling
reprocess, poll Get build status with a reasonable interval (e.g. 2–5 seconds) and timeout. - Choose method by need: Start with
fastfor testing; usebalancedoraccuratefor better quality; usevlmfor manuscripts; useagenticfor complex layouts and tables.
Error Reference
| Error Type | Status Code | Description |
|---|---|---|
BadRequestError | 400 | Invalid request format or partition method |
AuthenticationError | 401 | Invalid or missing API key |
PermissionDeniedError | 403 | Access denied to the specified project |
NotFoundError | 404 | Source not found for the given file_id |
RateLimitError | 429 | Too many requests, please retry after waiting |
InternalServerError | ≥500 | Processing failure or server error |
APIConnectionError | N/A | Network connectivity issues |
APITimeoutError | N/A | Request timed out |
Troubleshooting
Processing timeouts
Processing timeouts
Causes: Large files, complex documents, or heavy server loadSolutions:
- Increase request timeout (5+ minutes recommended)
- Try a simpler processing method first
- Process during off-peak hours
- Python
- TypeScript
Source not found (404)
Source not found (404)
Causes: Invalid
file_id, source deleted, or wrong projectSolutions:- Use
client.sources.list()to get validfile_ids - Ensure you’re using the correct API key for the project
- Python
- TypeScript
Processing failures
Processing failures
Causes: Corrupted file, unsupported content, or method incompatibilitySolutions:
- Try a different
method(e.g.balanced,agentic) - Check file integrity; re-ingest if necessary using
client.sources.ingest_file()
Poor processing quality
Poor processing quality
Solutions:
- Use
balancedoraccuratefor complex layouts - Use
vlmfor manuscripts and handwritten documents - Use
agenticfor complex layouts with tables and diagrams
Next steps
After reprocessing, poll Get build status until complete, then:Get build status
Poll status and get parsed elements for a build
List sources
View all sources and their status
Upload
Ingest new files, URLs, GitHub repos, or YouTube videos
Get elements
Retrieve parsed elements from a source
Delete source
Remove a source by file_id

