Skip to main content
The Reprocess endpoint lets you re-run the ingestion pipeline on an existing source using a different partition method. Processing runs asynchronously: the API returns a build_id immediately; you then poll Get build status until the job completes.

Endpoint overview

HTTP Method

POST

Authentication

This endpoint requires authentication using an API token. Include your API token as a Bearer token in the Authorization header.
Learn how to create and manage API tokens in the API Tokens guide.

Async flow

  1. POST /reprocess with file_id and optional partition_method. The response returns immediately with a build_id.
  2. Poll Get build status: GET https://sources.graphorlm.com/builds/{build_id} until status is Completed or indicates failure.
  3. Use the file_id from the build status response (unchanged) for subsequent API calls.

Request format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKENYes
Content-Typeapplication/jsonYes

Request body

Send a JSON body with the following fields:
FieldTypeDescriptionRequired
file_idstringUnique identifier of the source to re-processYes
methodstringPartitioning strategy. One of: fast, balanced, accurate, vlm, agentic. Default: fastNo

Partition method values (v2)

Use these values for the partition_method field:
ValueNameDescription
fastFastFast processing with heuristic classification. No OCR.
balancedBalancedOCR-based extraction with structure classification.
accurateAccurateFine-tuned model for highest accuracy (Premium).
vlmVLMBest for manuscripts and handwritten content.
agenticAgenticHighest accuracy for complex layouts, tables, and diagrams.

Available processing methods

Best for: Simple text documents, quick processing
  • Fast processing with heuristic classification
  • No OCR processing
  • Suitable for plain text files and well-structured documents
  • Recommended for testing and development
Best for: Complex documents with varied layouts
  • OCR-based text extraction
  • AI-powered document structure classification
  • Better recognition of tables, figures, and document elements
  • Enhanced accuracy for complex layouts
Best for: Premium accuracy, specialized documents
  • OCR-based text extraction
  • Fine-tuned AI model for document classification
  • Highest accuracy for document structure recognition
  • Note: Premium feature
Best for: Text-first parsing, manuscripts, and handwritten documents
  • Best text-first parsing; no bounding boxes or page layout
  • Best for manuscript and handwritten documents
  • Performs page and document annotation
  • Best-in-class text parsing quality
Best for: Complex layouts, multi-page tables, diagrams, and images
  • Highest parsing setting for complex layouts
  • Rich annotations for images and complex elements
  • Agentic processing for enhanced understanding

Method comparison

MethodSpeedText parsingElement classificationBounding boxesBest use casesOCR
FastHighGoodGoodYes (limited)Simple text files, testingNo
BalancedMediumVery goodVery goodYesComplex layouts, mixed contentYes
AccurateMediumExcellentExcellentYesPremium accuracy neededYes
VLMHighExcellentGoodNoManuscripts, handwrittenYes
AgenticMediumExcellentExcellentYesComplex layouts, multi-page tables, diagramsYes

Request example

{
  "file_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "method": "balanced"
}
With default method (optional field omitted):
{
  "file_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
Re-processing runs in the background and can take several minutes depending on document size and the selected method. Use the returned build_id to poll Get build status until completion.

Response format

Success response (200 OK)

The endpoint returns immediately with a build identifier. It does not wait for processing to finish.
{
  "build_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
  "success": true,
  "error": null
}

Response fields

FieldTypeDescription
build_idstringUse this ID to poll Get build status
successbooleanWhether the re-process job was successfully scheduled
errorstring | nullError message if the job was not scheduled successfully
To get the final source metadata (file_id, file_name, status, etc.) and optional parsed elements, call GET /builds/{build_id} (see Upload sources – Get build status).

Code examples

JavaScript/Node.js

const reprocessSource = async (apiToken, fileId, partitionMethod = "fast") => {
  const response = await fetch("https://sources.graphorlm.com/reprocess", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ file_id: fileId, method: partitionMethod }),
  });

  if (!response.ok) {
    const err = await response.json().catch(() => ({}));
    throw new Error(err.detail || `Reprocess failed: ${response.status}`);
  }

  const { build_id } = await response.json();
  return build_id;
};

// Usage: get build_id, then poll Get build status until success
const buildId = await reprocessSource(
  "grlm_your_api_token_here",
  "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "balanced"
);
console.log("Build ID:", buildId);

Python

import requests

def reprocess_source(api_token, file_id, partition_method="fast"):
    url = "https://sources.graphorlm.com/reprocess"
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }
    payload = {"file_id": file_id, "method": partition_method}
    response = requests.post(url, headers=headers, json=payload, timeout=60)
    response.raise_for_status()
    return response.json()["build_id"]

# Usage
build_id = reprocess_source(
    "grlm_your_api_token_here",
    "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "balanced",
)
print("Build ID:", build_id)

cURL

curl -X POST https://sources.graphorlm.com/reprocess \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"file_id":"a1b2c3d4-e5f6-7890-abcd-ef1234567890","method":"balanced"}'

Reprocess and poll until complete

import time
import requests

def reprocess_and_wait(api_token, file_id, partition_method="balanced", poll_interval=3, max_wait=600):
    # Start reprocess
    r = requests.post(
        "https://sources.graphorlm.com/reprocess",
        headers={"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"},
        json={"file_id": file_id, "method": partition_method},
        timeout=60,
    )
    r.raise_for_status()
    build_id = r.json()["build_id"]

    # Poll until complete
    url = f"https://sources.graphorlm.com/builds/{build_id}"
    headers = {"Authorization": f"Bearer {api_token}"}
    start = time.time()
    while time.time() - start < max_wait:
        status_r = requests.get(url, headers=headers)
        status_r.raise_for_status()
        data = status_r.json()
        if data.get("success"):
            return data
        if data.get("status") == "Processing failed" or (data.get("error") and data.get("status") != "not_found"):
            raise RuntimeError(data.get("error") or data.get("message", "Reprocess failed"))
        time.sleep(poll_interval)
    raise TimeoutError("Reprocess did not complete in time")

Error responses

Common error codes

Status codeDescription
404Source not found for the given file_id
500Processing or unexpected internal error

Error response format

{
  "detail": "Source node not found"
}
{
  "detail": "Failed to process source"
}

Error examples

{ "detail": "Source node not found" }
Cause: The given file_id does not exist in your project.
Solution: Verify the file_id (e.g. from List sources or a previous upload/build status).
{ "detail": "Failed to process source" }
Cause: Internal error during re-processing.
Solution: Retry later or try a different partition_method; check file integrity.

When to reprocess

Symptoms: Missing text, garbled characters, incomplete content
Recommended: balanced or accurate for complex layouts; vlm for text-only when bounding boxes are not needed.
Symptoms: Tables not recognized, merged cells, structure lost
Recommended: balanced, accurate, or agentic for multi-page tables.
Symptoms: Missing captions, poor figure recognition
Recommended: balanced, accurate, or agentic for rich image annotations.
Symptoms: Headers/footers mixed with content, poor section detection
Recommended: balanced, accurate, or agentic for better structure and semantics.

Best practices

  • Use file_id: Always use the source’s file_id (from list sources or build status); do not rely on file name.
  • Poll build status: After calling reprocess, poll Get build status with a reasonable interval (e.g. 2–5 seconds) and timeout.
  • Choose method by need: Start with fast for testing; use balanced or accurate for better quality; use vlm for manuscripts; use agentic for complex layouts and tables.
  • Timeout: Allow sufficient time for large documents and heavier methods when polling.

Next steps

After re-processing completes (build status Completed):

Get build status

Poll status and optionally retrieve parsed elements for a build

List sources

View all sources and their status in your project

Upload sources

Upload new files, URLs, GitHub repos, or YouTube videos (async)

Delete source

Remove a source from your project