Skip to main content
The reprocess method (same name as the API endpoint) re-runs the ingestion pipeline on an existing source using a different partition method. Processing is asynchronous: the method returns a build_id immediately; poll Get build status until the job completes.

Method overview

Sync

client.sources.reprocess()

Async

await client.sources.reprocess()

Method signature

client.sources.reprocess(
    file_id: str,                           # Required
    method: str | None = None,             # Optional: fast, balanced, accurate, vlm, agentic
    timeout: float | None = None
) -> SourceReprocessResponse
Returns SourceReprocessResponse with .build_id.

Parameters

ParameterTypeDescriptionRequired
file_idstrUnique identifier of the source to re-processYes
methodstr | NoneOne of: fast, balanced, accurate, vlm, agentic. Default: fastNo
timeoutfloatRequest timeout in secondsNo

Partition method values (v2)

ValueNameDescription
fastFastFast processing with heuristic classification. No OCR.
balancedBalancedOCR-based extraction with structure classification.
accurateAccurateFine-tuned model for highest accuracy (Premium).
vlmVLMBest for manuscripts and handwritten content.
agenticAgenticHighest accuracy for complex layouts, tables, and diagrams.

Method comparison

MethodSpeedText parsingElement classificationBest use casesOCR
FastHighGoodGoodSimple text files, testingNo
BalancedMediumVery goodVery goodComplex layouts, mixed contentYes
AccurateMediumExcellentExcellentPremium accuracy neededYes
VLMHighExcellentGoodManuscripts, handwrittenYes
AgenticMediumExcellentExcellentComplex layouts, multi-page tables, diagramsYes

Return value

The method returns a build_id (string). Use it with Get build status to poll until processing completes (Completed or failure). The file_id does not change.

Code examples

Basic usage

from graphor import Graphor

client = Graphor()
file_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"  # from list() or get_build_status

response = client.sources.reprocess(
    file_id=file_id,
    method="balanced"
)
print(f"Build ID: {response.build_id}")

Reprocess and poll until complete

import time
from graphor import Graphor

client = Graphor()
file_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"

response = client.sources.reprocess(file_id=file_id, method="balanced")
build_id = response.build_id

while True:
    status = client.sources.get_build_status(build_id)
    if status.status == "Completed":
        print("Done. file_id:", status.file_id)
        break
    if status.error and status.status != "not_found":
        raise RuntimeError(status.error)
    time.sleep(2)

With partition method

response = client.sources.reprocess(file_id=file_id, method="agentic")
build_id = response.build_id
Reprocessing runs in the background and can take several minutes. Use Get build status to poll until completion.

Error handling

import graphor
from graphor import Graphor

client = Graphor()
try:
    response = client.sources.reprocess(file_id=file_id, method="balanced")
    print("Scheduled. Build ID:", response.build_id)
except graphor.NotFoundError as e:
    print("Source not found:", e)
except graphor.BadRequestError as e:
    print("Invalid request:", e)
except graphor.APIStatusError as e:
    print("API error:", e)

Batch reprocess

Reprocess multiple sources by file_id; each call returns a build_id. Poll Get build status for each until complete.
from graphor import Graphor

client = Graphor()
file_ids = ["id1", "id2", "id3"]
build_ids = []
for fid in file_ids:
    try:
        resp = client.sources.reprocess(file_id=fid, method="balanced")
        build_ids.append(resp.build_id)
        print(f"Scheduled: {fid} -> {resp.build_id}")
    except Exception as e:
        print(f"FAIL - {fid}: {e}")

When to reprocess

Symptoms: Missing text, garbled characters, incomplete content
Recommended: balanced or accurate for complex layouts; vlm for text-only when bounding boxes are not needed.
Symptoms: Tables not recognized, merged cells, structure lost
Recommended: balanced, accurate, or agentic for multi-page tables.
Symptoms: Missing captions, poor figure recognition
Recommended: balanced, accurate, or agentic for rich image annotations.
Symptoms: Headers/footers mixed with content, poor section detection
Recommended: balanced, accurate, or agentic for better structure and semantics.

Best practices

  • Use file_id: Always use the source’s file_id (from list sources or build status).
  • Poll build status: After calling reprocess, poll Get build status with a reasonable interval (e.g. 2–5 seconds) and timeout.
  • Choose method by need: Start with fast for testing; use balanced or accurate for better quality; use vlm for manuscripts; use agentic for complex layouts and tables.

Error Reference

Error TypeStatus CodeDescription
BadRequestError400Invalid request format or partition method
AuthenticationError401Invalid or missing API key
PermissionDeniedError403Access denied to the specified project
NotFoundError404Source not found for the given file_id
RateLimitError429Too many requests, please retry after waiting
InternalServerError≥500Processing failure or server error
APIConnectionErrorN/ANetwork connectivity issues
APITimeoutErrorN/ARequest timed out

Troubleshooting

Causes: Large files, complex documents, or heavy server loadSolutions:
  • Increase request timeout (5+ minutes recommended)
  • Try a simpler processing method first
  • Process during off-peak hours
client = Graphor(timeout=600.0)  # 10 minutes
Causes: Invalid file_id, source deleted, or wrong projectSolutions:
  • Use client.sources.list() to get valid file_ids
  • Ensure you’re using the correct API key for the project
sources = client.sources.list()
for s in sources:
    print(s.file_id, s.file_name)
Causes: Corrupted file, unsupported content, or method incompatibilitySolutions:
  • Try a different method (e.g. balanced, agentic)
  • Check file integrity; re-ingest if necessary using client.sources.ingest_file()
Solutions:
  • Use balanced or accurate for complex layouts
  • Use vlm for manuscripts and handwritten documents
  • Use agentic for complex layouts with tables and diagrams

Next steps

After reprocessing, poll Get build status until complete, then:

Get build status

Poll status and get parsed elements for a build

List sources

View all sources and their status

Upload

Ingest new files, URLs, GitHub repos, or YouTube videos

Get elements

Retrieve parsed elements from a source

Delete source

Remove a source by file_id