Skip to main content
The Process Source endpoint allows you to reprocess previously uploaded documents using different parsing and classification methods. This enables you to optimize document processing for better text extraction, structure recognition, and retrieval performance without re-uploading the file.

Endpoint Overview

HTTP Method

POST

Authentication

This endpoint requires authentication using an API token. You must include your API token as a Bearer token in the Authorization header.
Learn how to create and manage API tokens in the API Tokens guide.

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKEN✅ Yes
Content-Typeapplication/json✅ Yes

Request Body

The request must be sent as JSON with the following fields:
FieldTypeDescriptionRequired
file_namestringName of the previously uploaded file to reprocess✅ Yes
partition_methodstringProcessing method to use (see available methods below)✅ Yes

Available Processing Methods

Best for: Simple text documents, quick processing
  • Fast processing with heuristic classification
  • No OCR processing
  • Suitable for plain text files and well-structured documents
  • Recommended for testing and development
Best for: Scanned documents, images with text
  • Utilizes OCR for text extraction and parsing
  • Heuristic-based document element classification
  • Ideal for scanned PDFs and image files
  • Balances processing speed and accuracy
Best for: Complex documents with varied layouts
  • OCR-based text extraction
  • AI-powered document structure classification using Hi-Res model
  • Better recognition of tables, figures, and document elements
  • Enhanced accuracy for complex layouts
Best for: Premium accuracy, specialized documents
  • OCR-based text extraction
  • Fine-tuned AI model for document classification
  • Highest accuracy for document structure recognition
  • Optimized for specialized and complex document types
  • Note: Premium feature
Best for: Complex layouts, multi-page tables, diagrams, and images
  • Our highest parsing setting for complex layouts
  • Rich annotations for images and complex elements
  • Uses agentic processing for enhanced understanding
  • Advanced document understanding capabilities
Best for: Text-first parsing, manuscripts, and handwritten documents
  • Our best text-first parsing with high-quality output
  • Does not output bounding boxes or page layout (no bbox)
  • Best for MANUSCRIPT and HANDWRITTEN documents
  • Performs page annotation (page-level labels and context)
  • Performs document annotation (document-level labels and summaries)
  • Performs image annotation when images are present in the document
  • Best-in-class text parsing quality; element classification is limited

partition_method values

Use these values for the partition_method field when calling the endpoint:
Methodpartition_method
Fastbasic
OCR (deprecated)ocr
Balancedhi_res
Accuratehi_res_ft
Agenticgraphorlm
VLMmai

Processing Method Selection Guide

Method Comparison

MethodSpeedText ParsingElement ClassificationBounding BoxesBest Use CasesOCR
Fast⚡⚡⚡⭐⭐⭐⭐✅ (limited)Simple text files, testing
Balanced⭐⭐⭐⭐⭐⭐⭐⭐Complex layouts, mixed content
Accurate⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐Premium accuracy needed
VLM⚡⚡⚡⭐⭐⭐⭐⭐⭐⭐⭐Manuscripts, handwritten documents
Agentic⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐Complex layouts, multi-page tables, diagrams

Request Example

{
  "file_name": "document.pdf",
  "partition_method": "hi_res"
}
Processing can take several minutes depending on document size, complexity, and the selected processing method. Advanced methods like Balanced, Accurate, VLM and Agentic typically require more time for analysis.

Response Format

Success Response (200 OK)

{
  "status": "success",
  "message": "Source processed successfully",
  "file_name": "document.pdf",
  "file_size": 2048576,
  "file_type": "pdf",
  "file_source": "local file",
  "project_id": "550e8400-e29b-41d4-a716-446655440000",
  "project_name": "My Project",
  "partition_method": "hi_res"
}

Response Fields

FieldTypeDescription
statusstringProcessing result (typically “success”)
messagestringHuman-readable success message
file_namestringName of the processed file
file_sizeintegerSize of the file in bytes
file_typestringFile extension/type
file_sourcestringSource type of the original file
project_idstringUUID of the project containing the file
project_namestringName of the project
partition_methodstringProcessing method that was applied

Code Examples

JavaScript/Node.js

const processDocument = async (apiToken, fileName, partitionMethod) => {
  const response = await fetch('https://sources.graphorlm.com/process', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      file_name: fileName,
      partition_method: partitionMethod
    })
  });

  if (response.ok) {
    const result = await response.json();
    console.log('Processing successful:', result);
    return result;
  } else {
    const error = await response.text();
    throw new Error(`Processing failed: ${response.status} ${error}`);
  }
};

// Usage
processDocument('grlm_your_api_token_here', 'document.pdf', 'hi_res')
  .then(result => console.log('Document processed:', result.file_name))
  .catch(error => console.error('Error:', error));

Python

import requests
import json

def process_document(api_token, file_name, partition_method):
    url = "https://sources.graphorlm.com/process"
    
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json"
    }
    
    payload = {
        "file_name": file_name,
        "partition_method": partition_method
    }
    
    # Increased timeout for processing complex documents
    response = requests.post(
        url, 
        headers=headers, 
        json=payload, 
        timeout=300  # 5 minutes
    )
    
    if response.status_code == 200:
        result = response.json()
        print(f"Processing successful: {result['file_name']}")
        return result
    else:
        response.raise_for_status()

# Usage
try:
    result = process_document(
        "grlm_your_api_token_here", 
        "document.pdf", 
        "hi_res"
    )
    print(f"Document processed with method: {result['partition_method']}")
except requests.exceptions.RequestException as e:
    print(f"Error processing document: {e}")

cURL

curl -X POST https://sources.graphorlm.com/process \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "document.pdf",
    "partition_method": "hi_res"
  }'

PHP

<?php
function processDocument($apiToken, $fileName, $partitionMethod) {
    $url = "https://sources.graphorlm.com/process";
    
    $headers = [
        "Authorization: Bearer " . $apiToken,
        "Content-Type: application/json"
    ];
    
    $payload = json_encode([
        'file_name' => $fileName,
        'partition_method' => $partitionMethod
    ]);
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 300); // 5 minutes
    
    $response = curl_exec($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    
    if ($httpCode === 200) {
        return json_decode($response, true);
    } else {
        throw new Exception("Processing failed with HTTP code: " . $httpCode);
    }
}

// Usage
try {
    $result = processDocument(
        "grlm_your_api_token_here", 
        "document.pdf", 
        "hi_res"
    );
    echo "Document processed: " . $result['file_name'] . "\n";
    echo "Method used: " . $result['partition_method'] . "\n";
} catch (Exception $e) {
    echo "Error: " . $e->getMessage() . "\n";
}
?>

Error Responses

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid request format or missing required fields
401UnauthorizedInvalid or missing API token
403ForbiddenAccess denied to the specified project
404Not FoundFile not found in the project
500Internal Server ErrorProcessing failure or server error

Error Response Format

{
  "detail": "Source node not found"
}

Error Examples

{
  "detail": "Source node not found"
}
Cause: The specified file name doesn’t exist in your projectSolution: Verify the file name and ensure it was previously uploaded
{
  "detail": "Invalid authentication credentials"
}
Cause: API token is invalid, expired, or malformedSolution: Check your API token and ensure it hasn’t been revoked
{
  "detail": "Failed to process file document.pdf"
}
Cause: Internal processing error with the specified methodSolution: Try a different processing method or check file integrity
{
  "detail": "Invalid partition method specified"
}
Cause: Unsupported or invalid partition methodSolution: Use one of: basic, hi_res, hi_res_ft, graphorlm, mai

When to Reprocess

Symptoms: Missing text, garbled characters, incomplete contentRecommended methods:
  • Balanced or Accurate for complex layouts
  • VLM for text-only documents when bounding boxes are not required
Symptoms: Tables not properly recognized, merged cells, structure lostRecommended methods:
  • Balanced for better table detection
  • Accurate for complex table structures
  • Agentic for multi-page tables
Symptoms: Missing captions, poor figure recognitionRecommended methods:
  • Balanced for figure detection
  • Accurate for comprehensive image analysis
  • Agentic for rich image annotations
Symptoms: Headers/footers mixed with content, poor section detectionRecommended methods:
  • Balanced for structure recognition
  • Accurate for complex document hierarchies
  • Agentic for enhanced semantic structure and relationships

Best Practices

Processing Strategy

  • Start with Fast: For testing and simple documents
  • Upgrade gradually: Move to Balanced → Accurate → VLM → Agentic based on needs
  • Monitor results: Use document preview to evaluate processing quality
  • Consider efficiency vs. quality: Advanced methods take longer but provide better results

Performance Optimization

  • Batch processing: Process multiple files sequentially rather than simultaneously
  • Method selection: Choose the appropriate method for your document types
  • Timeout handling: Allow sufficient time for complex processing methods
  • Error recovery: Implement retry logic for transient failures

Quality Assessment

After processing, evaluate the results by:
  • Checking text extraction completeness
  • Verifying table and figure recognition
  • Reviewing document structure classification
  • Testing retrieval quality in your RAG pipeline

Integration Examples

Automatic Quality Improvement

def improve_processing_quality(api_token, file_name):
    """Automatically upgrade processing method for better quality."""
    methods = ['basic', 'hi_res', 'hi_res_ft', 'mai', 'graphorlm']
    
    for method in methods:
        try:
            print(f"Trying {method} method...")
            result = process_document(api_token, file_name, method)
            
            # Add your quality assessment logic here
            if assess_quality(result):
                print(f"Success with {method} method")
                return result
                
        except Exception as e:
            print(f"Failed with {method}: {e}")
            continue
    
    raise Exception("All processing methods failed")

def assess_quality(result):
    """Add your quality assessment logic here."""
    # Example: check if processing was successful
    return result.get('status') == 'success'

Batch Reprocessing

const batchReprocess = async (apiToken, files, method) => {
  const results = [];
  const failed = [];
  
  for (const fileName of files) {
    try {
      console.log(`Processing ${fileName} with ${method}...`);
      const result = await processDocument(apiToken, fileName, method);
      results.push(result);
      
      // Wait between requests to avoid rate limiting
      await new Promise(resolve => setTimeout(resolve, 1000));
      
    } catch (error) {
      console.error(`Failed to process ${fileName}:`, error);
      failed.push({ fileName, error: error.message });
    }
  }
  
  console.log(`Processed: ${results.length}, Failed: ${failed.length}`);
  return { successful: results, failed };
};

// Usage
const files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf'];
batchReprocess('grlm_your_token', files, 'hi_res')
  .then(results => console.log('Batch processing complete:', results));

Processing with Progress Tracking

import time
from typing import List, Dict

def process_with_progress(api_token: str, files_and_methods: List[Dict]):
    """Process multiple files with progress tracking."""
    total = len(files_and_methods)
    completed = 0
    results = []
    
    print(f"Starting batch processing of {total} files...")
    
    for item in files_and_methods:
        file_name = item['file_name']
        method = item['method']
        
        try:
            print(f"[{completed + 1}/{total}] Processing {file_name} with {method}...")
            start_time = time.time()
            
            result = process_document(api_token, file_name, method)
            
            duration = time.time() - start_time
            completed += 1
            
            results.append({
                'file_name': file_name,
                'method': method,
                'status': 'success',
                'duration': duration,
                'result': result
            })
            
            print(f"✅ Completed {file_name} in {duration:.1f}s")
            
        except Exception as e:
            completed += 1
            results.append({
                'file_name': file_name,
                'method': method,
                'status': 'failed',
                'error': str(e)
            })
            
            print(f"❌ Failed {file_name}: {e}")
        
        # Progress update
        progress = (completed / total) * 100
        print(f"Progress: {progress:.1f}% ({completed}/{total})")
        
        # Small delay between requests
        time.sleep(0.5)
    
    return results

# Usage
processing_queue = [
    {'file_name': 'document1.pdf', 'method': 'hi_res'},
    {'file_name': 'document2.pdf', 'method': 'hi_res_ft'},
    {'file_name': 'document3.pdf', 'method': 'ocr'}
]

results = process_with_progress('grlm_your_token', processing_queue)

Troubleshooting

Causes: Large files, complex documents, or heavy server loadSolutions:
  • Increase request timeout (5+ minutes recommended)
  • Try a simpler processing method first
  • Process during off-peak hours
  • Contact support for very large documents
Causes: Incorrect file name, file deleted, or wrong projectSolutions:
  • Verify exact file name (case-sensitive)
  • Use the List Sources endpoint to check available files
  • Ensure you’re using the correct API token for the project
Causes: Corrupted files, unsupported content, or method incompatibilitySolutions:
  • Try a different processing method
  • Check file integrity
  • Re-upload the file if necessary
  • Contact support for persistent issues
Causes: Method not suitable for document type, or complex layoutSolutions:
  • Upgrade to Balanced or Accurate method
  • Use VLM for manuscripts and handwritten documents
  • Use Agentic for complex layouts with tables and diagrams
  • Ensure document quality is good
  • Review processing results in the dashboard

Next Steps

After successfully processing your documents: