Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.graphorlm.com/llms.txt

Use this file to discover all available pages before exploring further.

The Reprocess endpoint lets you re-run the ingestion pipeline on an existing source using a different partition method. Processing runs asynchronously: the API returns a build_id immediately; you then poll Get build status until the job completes.

Endpoint overview

HTTP Method

POST

Authentication

This endpoint requires authentication using an API token. Include your API token as a Bearer token in the Authorization header.
Learn how to create and manage API tokens in the API Tokens guide.

Async flow

  1. POST /reprocess with file_id and optional partition_method. The response returns immediately with a build_id.
  2. Poll Get build status: GET https://sources.graphorlm.com/builds/{build_id} until status is Completed or indicates failure.
  3. Use the file_id from the build status response (unchanged) for subsequent API calls.

Request format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKENYes
Content-Typeapplication/jsonYes

Request body

Send a JSON body with the following fields:
FieldTypeDescriptionRequired
file_idstringUnique identifier of the source to re-processYes
methodstringPartitioning strategy. One of: fast, balanced, accurate, agentic. Default: fastNo

Partition method values (v2)

Use these values for the partition_method field:
ValueNameDescription
fastFastFast processing with heuristic classification. No OCR.
balancedBalancedOCR-based extraction with structure classification.
accurateAccurateFine-tuned model for highest accuracy (Premium).
agenticAgenticHighest accuracy for complex layouts, tables, and diagrams.

Available processing methods

Best for: Simple text documents, quick processing
  • Fast processing with heuristic classification
  • No OCR processing
  • Suitable for plain text files and well-structured documents
  • Recommended for testing and development
Best for: Complex documents with varied layouts
  • OCR-based text extraction
  • AI-powered document structure classification
  • Better recognition of tables, figures, and document elements
  • Enhanced accuracy for complex layouts
Best for: Premium accuracy, specialized documents
  • OCR-based text extraction
  • Fine-tuned AI model for document classification
  • Highest accuracy for document structure recognition
  • Note: Premium feature
Best for: Complex layouts, multi-page tables, diagrams, and images
  • Highest parsing setting for complex layouts
  • Rich annotations for images and complex elements
  • Agentic processing for enhanced understanding

Method comparison

MethodSpeedText parsingElement classificationBounding boxesBest use casesOCR
FastHighGoodGoodYes (limited)Simple text files, testingNo
BalancedMediumVery goodVery goodYesComplex layouts, mixed contentYes
AccurateMediumExcellentExcellentYesPremium accuracy neededYes
AgenticMediumExcellentExcellentYesComplex layouts, multi-page tables, diagramsYes

Request example

{
  "file_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "method": "balanced"
}
With default method (optional field omitted):
{
  "file_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
Re-processing runs in the background and can take several minutes depending on document size and the selected method. Use the returned build_id to poll Get build status until completion.

Response format

Success response (200 OK)

The endpoint returns immediately with a build identifier. It does not wait for processing to finish.
{
  "build_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
  "success": true,
  "error": null
}

Response fields

FieldTypeDescription
build_idstringUse this ID to poll Get build status
successbooleanWhether the re-process job was successfully scheduled
errorstring | nullError message if the job was not scheduled successfully
To get the final source metadata (file_id, file_name, status, etc.) and optional parsed elements, call GET /builds/{build_id} (see Upload sources – Get build status).

Code examples

JavaScript/Node.js

const reprocessSource = async (apiToken, fileId, partitionMethod = "fast") => {
  const response = await fetch("https://sources.graphorlm.com/reprocess", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ file_id: fileId, method: partitionMethod }),
  });

  if (!response.ok) {
    const err = await response.json().catch(() => ({}));
    throw new Error(err.detail || `Reprocess failed: ${response.status}`);
  }

  const { build_id } = await response.json();
  return build_id;
};

// Usage: get build_id, then poll Get build status until success
const buildId = await reprocessSource(
  "grlm_your_api_token_here",
  "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "balanced"
);
console.log("Build ID:", buildId);

Python

import requests

def reprocess_source(api_token, file_id, partition_method="fast"):
    url = "https://sources.graphorlm.com/reprocess"
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }
    payload = {"file_id": file_id, "method": partition_method}
    response = requests.post(url, headers=headers, json=payload, timeout=60)
    response.raise_for_status()
    return response.json()["build_id"]

# Usage
build_id = reprocess_source(
    "grlm_your_api_token_here",
    "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "balanced",
)
print("Build ID:", build_id)

cURL

curl -X POST https://sources.graphorlm.com/reprocess \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"file_id":"a1b2c3d4-e5f6-7890-abcd-ef1234567890","method":"balanced"}'

Reprocess and poll until complete

import time
import requests

def reprocess_and_wait(api_token, file_id, partition_method="balanced", poll_interval=3, max_wait=600):
    # Start reprocess
    r = requests.post(
        "https://sources.graphorlm.com/reprocess",
        headers={"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"},
        json={"file_id": file_id, "method": partition_method},
        timeout=60,
    )
    r.raise_for_status()
    build_id = r.json()["build_id"]

    # Poll until complete
    url = f"https://sources.graphorlm.com/builds/{build_id}"
    headers = {"Authorization": f"Bearer {api_token}"}
    start = time.time()
    while time.time() - start < max_wait:
        status_r = requests.get(url, headers=headers)
        status_r.raise_for_status()
        data = status_r.json()
        if data.get("success"):
            return data
        if data.get("status") == "Processing failed" or (data.get("error") and data.get("status") != "not_found"):
            raise RuntimeError(data.get("error") or data.get("message", "Reprocess failed"))
        time.sleep(poll_interval)
    raise TimeoutError("Reprocess did not complete in time")

Error responses

Common error codes

Status codeDescription
404Source not found for the given file_id
500Processing or unexpected internal error

Error response format

{
  "detail": "Source node not found"
}
{
  "detail": "Failed to process source"
}

Error examples

{ "detail": "Source node not found" }
Cause: The given file_id does not exist in your project.
Solution: Verify the file_id (e.g. from List sources or a previous upload/build status).
{ "detail": "Failed to process source" }
Cause: Internal error during re-processing.
Solution: Retry later or try a different partition_method; check file integrity.

When to reprocess

Symptoms: Missing text, garbled characters, incomplete content
Recommended: balanced or accurate for complex layouts.
Symptoms: Tables not recognized, merged cells, structure lost
Recommended: balanced, accurate, or agentic for multi-page tables.
Symptoms: Missing captions, poor figure recognition
Recommended: balanced, accurate, or agentic for rich image annotations.
Symptoms: Headers/footers mixed with content, poor section detection
Recommended: balanced, accurate, or agentic for better structure and semantics.

Best practices

  • Use file_id: Always use the source’s file_id (from list sources or build status); do not rely on file name.
  • Poll build status: After calling reprocess, poll Get build status with a reasonable interval (e.g. 2–5 seconds) and timeout.
  • Choose method by need: Start with fast for testing; use balanced or accurate for better quality; use agentic for complex layouts and tables.
  • Timeout: Allow sufficient time for large documents and heavier methods when polling.

Next steps

After re-processing completes (build status Completed):

Get build status

Poll status and optionally retrieve parsed elements for a build

List sources

View all sources and their status in your project

Upload sources

Upload new files, URLs, GitHub repos, or YouTube videos (async)

Delete source

Remove a source from your project