Process Source - GraphorLM Docs

The Process Source endpoint allows you to reprocess previously uploaded documents using different parsing and classification methods. This enables you to optimize document processing for better text extraction, structure recognition, and retrieval performance without re-uploading the file.

Endpoint Overview

HTTP Method

POST

Endpoint URL

https://sources.graphorlm.com/process

Authentication

This endpoint requires authentication using an API token. You must include your API token as a Bearer token in the Authorization header.

Learn how to create and manage API tokens in the API Tokens guide.

Request Format

Headers

Header	Value	Required
`Authorization`	`Bearer YOUR_API_TOKEN`	✅ Yes
`Content-Type`	`application/json`	✅ Yes

Request Body

The request must be sent as JSON with the following fields:

Field	Type	Description	Required
`file_name`	string	Name of the previously uploaded file to reprocess	✅ Yes
`partition_method`	string	Processing method to use (see available methods below)	✅ Yes

Available Processing Methods

Basic

OCR

YOLOX

Advanced

GraphorLM

Request Example

{
  "file_name": "document.pdf",
  "partition_method": "yolox"
}

Response Format

Success Response (200 OK)

{
  "status": "success",
  "message": "Source processed successfully",
  "file_name": "document.pdf",
  "file_size": 2048576,
  "file_type": "pdf",
  "file_source": "local file",
  "project_id": "550e8400-e29b-41d4-a716-446655440000",
  "project_name": "My Project",
  "partition_method": "yolox"
}

Response Fields

Field	Type	Description
`status`	string	Processing result (typically “success”)
`message`	string	Human-readable success message
`file_name`	string	Name of the processed file
`file_size`	integer	Size of the file in bytes
`file_type`	string	File extension/type
`file_source`	string	Source type of the original file
`project_id`	string	UUID of the project containing the file
`project_name`	string	Name of the project
`partition_method`	string	Processing method that was applied

Code Examples

JavaScript/Node.js

const processDocument = async (apiToken, fileName, partitionMethod) => {
  const response = await fetch('https://sources.graphorlm.com/process', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      file_name: fileName,
      partition_method: partitionMethod
    })
  });

  if (response.ok) {
    const result = await response.json();
    console.log('Processing successful:', result);
    return result;
  } else {
    const error = await response.text();
    throw new Error(`Processing failed: ${response.status} ${error}`);
  }
};

// Usage
processDocument('grlm_your_api_token_here', 'document.pdf', 'yolox')
  .then(result => console.log('Document processed:', result.file_name))
  .catch(error => console.error('Error:', error));

Python

import requests
import json

def process_document(api_token, file_name, partition_method):
    url = "https://sources.graphorlm.com/process"
    
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json"
    }
    
    payload = {
        "file_name": file_name,
        "partition_method": partition_method
    }
    
    # Increased timeout for processing complex documents
    response = requests.post(
        url, 
        headers=headers, 
        json=payload, 
        timeout=300  # 5 minutes
    )
    
    if response.status_code == 200:
        result = response.json()
        print(f"Processing successful: {result['file_name']}")
        return result
    else:
        response.raise_for_status()

# Usage
try:
    result = process_document(
        "grlm_your_api_token_here", 
        "document.pdf", 
        "yolox"
    )
    print(f"Document processed with method: {result['partition_method']}")
except requests.exceptions.RequestException as e:
    print(f"Error processing document: {e}")

cURL

curl -X POST https://sources.graphorlm.com/process \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "document.pdf",
    "partition_method": "yolox"
  }'

PHP

<?php
function processDocument($apiToken, $fileName, $partitionMethod) {
    $url = "https://sources.graphorlm.com/process";
    
    $headers = [
        "Authorization: Bearer " . $apiToken,
        "Content-Type: application/json"
    ];
    
    $payload = json_encode([
        'file_name' => $fileName,
        'partition_method' => $partitionMethod
    ]);
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 300); // 5 minutes
    
    $response = curl_exec($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    
    if ($httpCode === 200) {
        return json_decode($response, true);
    } else {
        throw new Exception("Processing failed with HTTP code: " . $httpCode);
    }
}

// Usage
try {
    $result = processDocument(
        "grlm_your_api_token_here", 
        "document.pdf", 
        "yolox"
    );
    echo "Document processed: " . $result['file_name'] . "\n";
    echo "Method used: " . $result['partition_method'] . "\n";
} catch (Exception $e) {
    echo "Error: " . $e->getMessage() . "\n";
}
?>

Error Responses

Common Error Codes

Status Code	Error Type	Description
`400`	Bad Request	Invalid request format or missing required fields
`401`	Unauthorized	Invalid or missing API token
`403`	Forbidden	Access denied to the specified project
`404`	Not Found	File not found in the project
`500`	Internal Server Error	Processing failure or server error

Error Response Format

{
  "detail": "Source node not found"
}

Error Examples

File Not Found (404)

Invalid API Token (401)

Processing Failed (500)

Invalid Method (400)

Processing Method Selection Guide

Method Comparison

Method	Speed	Accuracy	Best Use Cases	OCR	AI Classification
Basic	⚡⚡⚡	⭐⭐	Simple text files, testing	❌	❌
OCR	⚡⚡	⭐⭐⭐	Scanned documents, images	✅	❌
YOLOX	⚡	⭐⭐⭐⭐	Complex layouts, mixed content	✅	✅
Advanced	⚡	⭐⭐⭐⭐⭐	Premium accuracy needed	✅	✅ Premium
GraphorLM	⚡	⭐⭐⭐⭐	Custom workflows	✅	✅ Custom

When to Reprocess

Poor text extraction

Table detection issues

Image and figure handling

Document structure problems

Best Practices

Processing Strategy

Start with Basic: For testing and simple documents
Upgrade gradually: Move to OCR → YOLOX → Advanced based on needs
Monitor results: Use document preview to evaluate processing quality
Consider cost vs. quality: Advanced methods take longer but provide better results

Performance Optimization

Batch processing: Process multiple files sequentially rather than simultaneously
Method selection: Choose the appropriate method for your document types
Timeout handling: Allow sufficient time for complex processing methods
Error recovery: Implement retry logic for transient failures

Quality Assessment

After processing, evaluate the results by:

Checking text extraction completeness
Verifying table and figure recognition
Reviewing document structure classification
Testing retrieval quality in your RAG pipeline

Integration Examples

Automatic Quality Improvement

def improve_processing_quality(api_token, file_name):
    """Automatically upgrade processing method for better quality."""
    methods = ['basic', 'ocr', 'yolox', 'advanced']
    
    for method in methods:
        try:
            print(f"Trying {method} method...")
            result = process_document(api_token, file_name, method)
            
            # Add your quality assessment logic here
            if assess_quality(result):
                print(f"Success with {method} method")
                return result
                
        except Exception as e:
            print(f"Failed with {method}: {e}")
            continue
    
    raise Exception("All processing methods failed")

def assess_quality(result):
    """Add your quality assessment logic here."""
    # Example: check if processing was successful
    return result.get('status') == 'success'

Batch Reprocessing

const batchReprocess = async (apiToken, files, method) => {
  const results = [];
  const failed = [];
  
  for (const fileName of files) {
    try {
      console.log(`Processing ${fileName} with ${method}...`);
      const result = await processDocument(apiToken, fileName, method);
      results.push(result);
      
      // Wait between requests to avoid rate limiting
      await new Promise(resolve => setTimeout(resolve, 1000));
      
    } catch (error) {
      console.error(`Failed to process ${fileName}:`, error);
      failed.push({ fileName, error: error.message });
    }
  }
  
  console.log(`Processed: ${results.length}, Failed: ${failed.length}`);
  return { successful: results, failed };
};

// Usage
const files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf'];
batchReprocess('grlm_your_token', files, 'yolox')
  .then(results => console.log('Batch processing complete:', results));

Processing with Progress Tracking

import time
from typing import List, Dict

def process_with_progress(api_token: str, files_and_methods: List[Dict]):
    """Process multiple files with progress tracking."""
    total = len(files_and_methods)
    completed = 0
    results = []
    
    print(f"Starting batch processing of {total} files...")
    
    for item in files_and_methods:
        file_name = item['file_name']
        method = item['method']
        
        try:
            print(f"[{completed + 1}/{total}] Processing {file_name} with {method}...")
            start_time = time.time()
            
            result = process_document(api_token, file_name, method)
            
            duration = time.time() - start_time
            completed += 1
            
            results.append({
                'file_name': file_name,
                'method': method,
                'status': 'success',
                'duration': duration,
                'result': result
            })
            
            print(f"✅ Completed {file_name} in {duration:.1f}s")
            
        except Exception as e:
            completed += 1
            results.append({
                'file_name': file_name,
                'method': method,
                'status': 'failed',
                'error': str(e)
            })
            
            print(f"❌ Failed {file_name}: {e}")
        
        # Progress update
        progress = (completed / total) * 100
        print(f"Progress: {progress:.1f}% ({completed}/{total})")
        
        # Small delay between requests
        time.sleep(0.5)
    
    return results

# Usage
processing_queue = [
    {'file_name': 'document1.pdf', 'method': 'yolox'},
    {'file_name': 'document2.pdf', 'method': 'advanced'},
    {'file_name': 'document3.pdf', 'method': 'ocr'}
]

results = process_with_progress('grlm_your_token', processing_queue)

Troubleshooting

Processing timeouts

File not found errors

Processing failures

Poor processing quality

Next Steps

After successfully processing your documents:

List Sources

View all your processed documents and their current status

Data Ingestion Guide

Learn more about document processing methods and optimization

Chunking

Optimize document segmentation after processing for better retrieval

Upload Source

Upload new documents to your project

Sources

​Endpoint Overview

HTTP Method

Endpoint URL

​Authentication

​Request Format

​Headers

​Request Body

​Available Processing Methods

​Request Example

​Response Format

​Success Response (200 OK)

​Response Fields

​Code Examples

​JavaScript/Node.js

​Python

​cURL

​PHP

​Error Responses

​Common Error Codes

​Error Response Format

​Error Examples

​Processing Method Selection Guide

​Method Comparison

​When to Reprocess

​Best Practices

​Processing Strategy

​Performance Optimization

​Quality Assessment

​Integration Examples

​Automatic Quality Improvement

​Batch Reprocessing

​Processing with Progress Tracking

​Troubleshooting

​Next Steps

List Sources

Data Ingestion Guide

Chunking

Upload Source

Endpoint Overview

Authentication

Request Format

Headers

Request Body

Available Processing Methods

Request Example

Response Format

Success Response (200 OK)

Response Fields

Code Examples

JavaScript/Node.js

Python

cURL

PHP

Error Responses

Common Error Codes

Error Response Format

Error Examples

Processing Method Selection Guide

Method Comparison

When to Reprocess

Best Practices

Processing Strategy

Performance Optimization

Quality Assessment

Integration Examples

Automatic Quality Improvement

Batch Reprocessing

Processing with Progress Tracking

Troubleshooting

Next Steps