The Process Source endpoint allows you to reprocess previously uploaded documents using different parsing and classification methods. This enables you to optimize document processing for better text extraction, structure recognition, and retrieval performance without re-uploading the file.

Endpoint Overview

HTTP Method

POST

Authentication

This endpoint requires authentication using an API token. You must include your API token as a Bearer token in the Authorization header.

Learn how to create and manage API tokens in the API Tokens guide.

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKEN✅ Yes
Content-Typeapplication/json✅ Yes

Request Body

The request must be sent as JSON with the following fields:

FieldTypeDescriptionRequired
file_namestringName of the previously uploaded file to reprocess✅ Yes
partition_methodstringProcessing method to use (see available methods below)✅ Yes

Available Processing Methods

Request Example

{
  "file_name": "document.pdf",
  "partition_method": "yolox"
}

Response Format

Success Response (200 OK)

{
  "status": "success",
  "message": "Source processed successfully",
  "file_name": "document.pdf",
  "file_size": 2048576,
  "file_type": "pdf",
  "file_source": "local file",
  "project_id": "550e8400-e29b-41d4-a716-446655440000",
  "project_name": "My Project",
  "partition_method": "yolox"
}

Response Fields

FieldTypeDescription
statusstringProcessing result (typically “success”)
messagestringHuman-readable success message
file_namestringName of the processed file
file_sizeintegerSize of the file in bytes
file_typestringFile extension/type
file_sourcestringSource type of the original file
project_idstringUUID of the project containing the file
project_namestringName of the project
partition_methodstringProcessing method that was applied

Code Examples

JavaScript/Node.js

const processDocument = async (apiToken, fileName, partitionMethod) => {
  const response = await fetch('https://sources.graphorlm.com/process', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      file_name: fileName,
      partition_method: partitionMethod
    })
  });

  if (response.ok) {
    const result = await response.json();
    console.log('Processing successful:', result);
    return result;
  } else {
    const error = await response.text();
    throw new Error(`Processing failed: ${response.status} ${error}`);
  }
};

// Usage
processDocument('grlm_your_api_token_here', 'document.pdf', 'yolox')
  .then(result => console.log('Document processed:', result.file_name))
  .catch(error => console.error('Error:', error));

Python

import requests
import json

def process_document(api_token, file_name, partition_method):
    url = "https://sources.graphorlm.com/process"
    
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json"
    }
    
    payload = {
        "file_name": file_name,
        "partition_method": partition_method
    }
    
    # Increased timeout for processing complex documents
    response = requests.post(
        url, 
        headers=headers, 
        json=payload, 
        timeout=300  # 5 minutes
    )
    
    if response.status_code == 200:
        result = response.json()
        print(f"Processing successful: {result['file_name']}")
        return result
    else:
        response.raise_for_status()

# Usage
try:
    result = process_document(
        "grlm_your_api_token_here", 
        "document.pdf", 
        "yolox"
    )
    print(f"Document processed with method: {result['partition_method']}")
except requests.exceptions.RequestException as e:
    print(f"Error processing document: {e}")

cURL

curl -X POST https://sources.graphorlm.com/process \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "document.pdf",
    "partition_method": "yolox"
  }'

PHP

<?php
function processDocument($apiToken, $fileName, $partitionMethod) {
    $url = "https://sources.graphorlm.com/process";
    
    $headers = [
        "Authorization: Bearer " . $apiToken,
        "Content-Type: application/json"
    ];
    
    $payload = json_encode([
        'file_name' => $fileName,
        'partition_method' => $partitionMethod
    ]);
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 300); // 5 minutes
    
    $response = curl_exec($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    
    if ($httpCode === 200) {
        return json_decode($response, true);
    } else {
        throw new Exception("Processing failed with HTTP code: " . $httpCode);
    }
}

// Usage
try {
    $result = processDocument(
        "grlm_your_api_token_here", 
        "document.pdf", 
        "yolox"
    );
    echo "Document processed: " . $result['file_name'] . "\n";
    echo "Method used: " . $result['partition_method'] . "\n";
} catch (Exception $e) {
    echo "Error: " . $e->getMessage() . "\n";
}
?>

Error Responses

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid request format or missing required fields
401UnauthorizedInvalid or missing API token
403ForbiddenAccess denied to the specified project
404Not FoundFile not found in the project
500Internal Server ErrorProcessing failure or server error

Error Response Format

{
  "detail": "Source node not found"
}

Error Examples

Processing Method Selection Guide

Method Comparison

MethodSpeedAccuracyBest Use CasesOCRAI Classification
Basic⚡⚡⚡⭐⭐Simple text files, testing
OCR⚡⚡⭐⭐⭐Scanned documents, images
YOLOX⭐⭐⭐⭐Complex layouts, mixed content
Advanced⭐⭐⭐⭐⭐Premium accuracy needed✅ Premium
GraphorLM⭐⭐⭐⭐Custom workflows✅ Custom

When to Reprocess

Best Practices

Processing Strategy

  • Start with Basic: For testing and simple documents
  • Upgrade gradually: Move to OCR → YOLOX → Advanced based on needs
  • Monitor results: Use document preview to evaluate processing quality
  • Consider cost vs. quality: Advanced methods take longer but provide better results

Performance Optimization

  • Batch processing: Process multiple files sequentially rather than simultaneously
  • Method selection: Choose the appropriate method for your document types
  • Timeout handling: Allow sufficient time for complex processing methods
  • Error recovery: Implement retry logic for transient failures

Quality Assessment

After processing, evaluate the results by:

  • Checking text extraction completeness
  • Verifying table and figure recognition
  • Reviewing document structure classification
  • Testing retrieval quality in your RAG pipeline

Integration Examples

Automatic Quality Improvement

def improve_processing_quality(api_token, file_name):
    """Automatically upgrade processing method for better quality."""
    methods = ['basic', 'ocr', 'yolox', 'advanced']
    
    for method in methods:
        try:
            print(f"Trying {method} method...")
            result = process_document(api_token, file_name, method)
            
            # Add your quality assessment logic here
            if assess_quality(result):
                print(f"Success with {method} method")
                return result
                
        except Exception as e:
            print(f"Failed with {method}: {e}")
            continue
    
    raise Exception("All processing methods failed")

def assess_quality(result):
    """Add your quality assessment logic here."""
    # Example: check if processing was successful
    return result.get('status') == 'success'

Batch Reprocessing

const batchReprocess = async (apiToken, files, method) => {
  const results = [];
  const failed = [];
  
  for (const fileName of files) {
    try {
      console.log(`Processing ${fileName} with ${method}...`);
      const result = await processDocument(apiToken, fileName, method);
      results.push(result);
      
      // Wait between requests to avoid rate limiting
      await new Promise(resolve => setTimeout(resolve, 1000));
      
    } catch (error) {
      console.error(`Failed to process ${fileName}:`, error);
      failed.push({ fileName, error: error.message });
    }
  }
  
  console.log(`Processed: ${results.length}, Failed: ${failed.length}`);
  return { successful: results, failed };
};

// Usage
const files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf'];
batchReprocess('grlm_your_token', files, 'yolox')
  .then(results => console.log('Batch processing complete:', results));

Processing with Progress Tracking

import time
from typing import List, Dict

def process_with_progress(api_token: str, files_and_methods: List[Dict]):
    """Process multiple files with progress tracking."""
    total = len(files_and_methods)
    completed = 0
    results = []
    
    print(f"Starting batch processing of {total} files...")
    
    for item in files_and_methods:
        file_name = item['file_name']
        method = item['method']
        
        try:
            print(f"[{completed + 1}/{total}] Processing {file_name} with {method}...")
            start_time = time.time()
            
            result = process_document(api_token, file_name, method)
            
            duration = time.time() - start_time
            completed += 1
            
            results.append({
                'file_name': file_name,
                'method': method,
                'status': 'success',
                'duration': duration,
                'result': result
            })
            
            print(f"✅ Completed {file_name} in {duration:.1f}s")
            
        except Exception as e:
            completed += 1
            results.append({
                'file_name': file_name,
                'method': method,
                'status': 'failed',
                'error': str(e)
            })
            
            print(f"❌ Failed {file_name}: {e}")
        
        # Progress update
        progress = (completed / total) * 100
        print(f"Progress: {progress:.1f}% ({completed}/{total})")
        
        # Small delay between requests
        time.sleep(0.5)
    
    return results

# Usage
processing_queue = [
    {'file_name': 'document1.pdf', 'method': 'yolox'},
    {'file_name': 'document2.pdf', 'method': 'advanced'},
    {'file_name': 'document3.pdf', 'method': 'ocr'}
]

results = process_with_progress('grlm_your_token', processing_queue)

Troubleshooting

Next Steps

After successfully processing your documents: