Skip to main content
This page documents Graphor’s source ingestion endpoints. Use it to upload content to your project via the REST API — whether that content is a local file, a public web page URL, a public GitHub repository, or a public YouTube video.

Endpoints

Authentication

All endpoints on this page require authentication using an API token. You must include your API token as a Bearer token in the Authorization header.
Learn how to create and manage API tokens in the API Tokens guide.

Upload a File

Endpoint Overview

HTTP Method

POST

Authentication

Uses the same API token authentication described in Authentication.

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKEN✅ Yes
Content-Typemultipart/form-data✅ Yes

Request Body

The request must be sent as multipart/form-data with the following fields:
FieldTypeDescriptionRequired
fileFileThe document file to upload✅ Yes
partition_methodstringProcessing method to use (see Partition Methods below)No

Partition Methods

When provided, the partition_method parameter allows you to process/parse the document immediately during upload. If not provided, the system uses the default method.
ValueNameDescription
basicFastFast processing with heuristic classification. No OCR.
hi_resBalancedOCR-based extraction with AI-powered structure classification.
hi_res_ftAccurateFine-tuned AI model for highest accuracy (Premium).
maiVLMBest text-first parsing for manuscripts and handwritten documents.
graphorlmAgenticHighest parsing setting for complex layouts, multi-page tables, and diagrams.
For more details about processing methods, see the Process Source documentation.

File Requirements

Graphor supports a wide range of document formats:Documents: PDF, DOC, DOCX, TXT, TEXT, MD, HTML, HTMPresentations: PPT, PPTXSpreadsheets: CSV, TSV, XLS, XLSXImages: PNG, JPG, JPEG, TIFF, BMP, HEICAudio: MP3, WAV, M4A, OGG, FLACVideo: MP4, MOV, AVI, MKV, WEBM
Maximum file size: 100MB per fileFor larger files, consider:
  • Compressing the file if possible
  • Splitting large documents into smaller sections
  • Using file optimization tools before upload
  • File must have a valid filename with extension
  • Extension determines the processing method
  • File names should be descriptive for easy identification

Response Format

Success Response (200 OK)

{
  "status": "New",
  "message": "File document.pdf processed successfully",
  "file_id": "file_abc123",
  "file_name": "document.pdf",
  "file_size": 2048576,
  "file_type": "pdf",
  "file_source": "local file",
  "project_id": "550e8400-e29b-41d4-a716-446655440000",
  "project_name": "My Project",
  "partition_method": "basic"
}

Response Fields

FieldTypeDescription
statusstringProcessing status (New, Processing, Completed, Failed)
messagestringHuman-readable success message
file_idstringUnique identifier for the source (use this for subsequent API calls)
file_namestringName of the uploaded file
file_sizeintegerSize of the file in bytes
file_typestringFile extension/type
file_sourcestringSource type (always “local file” for uploads)
project_idstringUUID of the target project
project_namestringName of the target project
partition_methodstringDocument processing method used

Code Examples

JavaScript/Node.js

const uploadDocument = async (apiToken, filePath) => {
  const formData = new FormData();
  const fileStream = fs.createReadStream(filePath);
  formData.append('file', fileStream);

  const response = await fetch('https://sources.graphorlm.com/upload', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`
    },
    body: formData
  });

  if (response.ok) {
    const result = await response.json();
    console.log('Upload successful:', result);
    return result;
  } else {
    throw new Error(`Upload failed: ${response.status} ${response.statusText}`);
  }
};

// Usage
uploadDocument('grlm_your_api_token_here', './document.pdf')
  .then(result => console.log('Document uploaded:', result.file_name))
  .catch(error => console.error('Error:', error));

Python

import requests

def upload_document(api_token, file_path):
    url = "https://sources.graphorlm.com/upload"
    
    headers = {
        "Authorization": f"Bearer {api_token}"
    }
    
    with open(file_path, "rb") as file:
        files = {"file": (file_path, file)}
        
        response = requests.post(url, headers=headers, files=files, timeout=300)
        
        if response.status_code == 200:
            result = response.json()
            print(f"Upload successful: {result['file_name']}")
            return result
        else:
            response.raise_for_status()

# Usage
try:
    result = upload_document("grlm_your_api_token_here", "document.pdf")
    print(f"Document uploaded: {result['file_name']}")
except requests.exceptions.RequestException as e:
    print(f"Error uploading document: {e}")

cURL

curl -X POST https://sources.graphorlm.com/upload \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -F "[email protected]"

cURL with Partition Method

curl -X POST https://sources.graphorlm.com/upload \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -F "[email protected]" \
  -F "partition_method=hi_res"

Error Responses

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid file type, missing filename, or malformed request
401UnauthorizedInvalid or missing API token
403ForbiddenAccess denied to the specified project
404Not FoundProject not found
413Payload Too LargeFile exceeds 100MB limit
500Internal Server ErrorServer-side processing error

Error Response Format

{
  "detail": "File type '.xyz' is not supported. Allowed types: .pdf, .md, .png, .jpg, ..."
}

Error Examples

{
  "detail": "File type '.xyz' is not supported. Allowed types: .pdf, .md, .png, .jpg, .jpeg, .tiff, .bmp, .heic, .csv, .tsv, .xls, .xlsx, .ppt, .pptx, .doc, .docx, .html, .htm, .txt, .text, .mp3, .wav, .m4a, .ogg, .flac, .mp4, .mov, .avi, .mkv, .webm"
}
{
  "detail": "Invalid authentication credentials"
}
{
  "detail": "File size exceeds the maximum allowed limit of 100MB"
}

Document Processing

After a successful upload, Graphor automatically processes your document:

Processing Stages

  1. Upload Complete - File is securely stored in your project
  2. Text Extraction - Content is extracted using advanced OCR and parsing
  3. Structure Recognition - Document elements are identified and classified
  4. Ready for Use - Document is available for chunking and retrieval

Processing Methods

The system automatically selects the optimal processing method based on file type:
File TypeDefault MethodDescription
PDF, DocumentsBasicFast processing with heuristic classification
ImagesOCROptical character recognition for text extraction
Text filesBasicDirect text processing
SpreadsheetsBasicTable structure preservation
You can reprocess documents with different methods using the Process Source endpoint after upload.

Best Practices

File Preparation

  • Optimize file size: Compress large files when possible while maintaining quality
  • Use descriptive names: Include relevant keywords in filenames for easy identification
  • Check file integrity: Ensure files are not corrupted before upload

Error Handling

  • Implement retry logic: Handle temporary network issues with exponential backoff
  • Validate before upload: Check file types and sizes client-side before making requests
  • Monitor upload status: Use the response to track processing progress

Security

  • Protect API tokens: Never expose tokens in client-side code or public repositories
  • Use HTTPS only: All API requests are automatically secured with TLS encryption
  • Rotate tokens regularly: Update API tokens periodically for enhanced security

Integration Examples

Batch Upload Script

import os
import requests
from pathlib import Path

def batch_upload_documents(api_token, directory_path):
    """Upload all supported documents from a directory."""
    supported_extensions = {'.pdf', '.doc', '.docx', '.txt', '.md', '.html'}
    uploaded_files = []
    failed_files = []
    
    for file_path in Path(directory_path).iterdir():
        if file_path.suffix.lower() in supported_extensions:
            try:
                result = upload_document(api_token, str(file_path))
                uploaded_files.append(result['file_name'])
                print(f"✅ Uploaded: {file_path.name}")
            except Exception as e:
                failed_files.append((file_path.name, str(e)))
                print(f"❌ Failed: {file_path.name} - {e}")
    
    print(f"\nSummary: {len(uploaded_files)} uploaded, {len(failed_files)} failed")
    return uploaded_files, failed_files

# Usage
uploaded, failed = batch_upload_documents("grlm_your_token", "./documents/")

Upload with Progress Tracking

const uploadWithProgress = async (apiToken, file, onProgress) => {
  return new Promise((resolve, reject) => {
    const xhr = new XMLHttpRequest();
    const formData = new FormData();
    formData.append('file', file);

    xhr.upload.addEventListener('progress', (event) => {
      if (event.lengthComputable) {
        const percentComplete = (event.loaded / event.total) * 100;
        onProgress(percentComplete);
      }
    });

    xhr.addEventListener('load', () => {
      if (xhr.status === 200) {
        resolve(JSON.parse(xhr.responseText));
      } else {
        reject(new Error(`Upload failed: ${xhr.status}`));
      }
    });

    xhr.addEventListener('error', () => reject(new Error('Upload failed')));

    xhr.open('POST', 'https://sources.graphorlm.com/upload');
    xhr.setRequestHeader('Authorization', `Bearer ${apiToken}`);
    xhr.send(formData);
  });
};

Troubleshooting

Causes: Large files, slow connection, or server loadSolutions:
  • Increase request timeout (recommend 5+ minutes for large files)
  • Retry failed uploads with exponential backoff
  • Consider compressing large files before upload
Causes: Corrupted files, unsupported formats, or complex layoutsSolutions:
  • Verify file integrity before upload
  • Try converting to a more standard format
  • Use the Process Source endpoint with different methods
Causes: Invalid tokens, expired tokens, or incorrect headersSolutions:
  • Verify token format starts with “grlm_”
  • Check token hasn’t been revoked in the dashboard
  • Ensure correct Authorization header format
Causes: DNS issues, firewall restrictions, or network timeoutsSolutions:
  • Test connectivity to sources.graphorlm.com
  • Check firewall allows outbound HTTPS traffic
  • Use appropriate timeout values for your network

Upload from URL

Use this endpoint to ingest content by scraping a public web page. It fetches the page, extracts text, and creates a source in your project for downstream processing.

Endpoint Overview

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKEN✅ Yes
Content-Typeapplication/json✅ Yes

Request Body

Send a JSON body with the following fields:
FieldTypeDescriptionRequired
urlstringThe URL of the web page to scrape✅ Yes
crawlUrlsbooleanWhether to crawl and ingest links from the given URLNo (default: false)
partition_methodstringProcessing method to use (see Partition Methods above)No

URL Requirements

  • Public web pages
  • Pages that render primary content server-side and are reachable without interaction
  • The URL must be publicly reachable over HTTPS
  • Authentication-protected pages are not supported by this endpoint
This endpoint scrapes web pages. To ingest files (PDF, DOCX, etc.), use Upload a File.

Response Format

Success Response (200 OK)

{
  "status": "Processing",
  "message": "Source processed successfully",
  "file_id": "file_abc123",
  "file_name": "https://example.com/",
  "file_size": 0,
  "file_type": "",
  "file_source": "url",
  "project_id": "550e8400-e29b-41d4-a716-446655440000",
  "project_name": "My Project",
  "partition_method": "basic"
}

Response Fields

FieldTypeDescription
statusstringProcessing status (New, Processing, Completed, Failed, etc.)
messagestringHuman-readable status message
file_idstringUnique identifier for the source (use this for subsequent API calls)
file_namestringName or URL for the ingested source
file_sizeintegerSize in bytes (0 for URL-based initial record)
file_typestringDetected type when applicable
file_sourcestringSource type (url)
project_idstringUUID of the project
project_namestringName of the project
partition_methodstringDocument processing method used

Code Examples

JavaScript/Node.js

import fetch from 'node-fetch';

const uploadUrlSource = async (apiToken, url, crawlUrls = false) => {
  const response = await fetch('https://sources.graphorlm.com/upload-url-source', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url, crawlUrls })
  });

  if (!response.ok) {
    throw new Error(`Upload from URL failed: ${response.status} ${response.statusText}`);
  }

  const result = await response.json();
  console.log('URL upload accepted:', result);
  return result;
};

// Usage (scrapes the page content)
uploadUrlSource('grlm_your_api_token_here', 'https://example.com/');

Python

import requests

def upload_url_source(api_token, url, crawl_urls=False):
    endpoint = "https://sources.graphorlm.com/upload-url-source"
    headers = {"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"}
    payload = {"url": url, "crawlUrls": crawl_urls}

    response = requests.post(endpoint, headers=headers, json=payload, timeout=300)
    response.raise_for_status()
    return response.json()

# Usage (scrapes the page content)
result = upload_url_source("grlm_your_api_token_here", "https://example.com")
print("URL scraping accepted:", result["file_name"])  # typically echoes the URL

cURL

curl -X POST https://sources.graphorlm.com/upload-url-source \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/","crawlUrls":false}'

Error Responses

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid or missing URL, malformed JSON
401UnauthorizedInvalid or missing API token
403ForbiddenAccess denied to the specified project
404Not FoundProject or source not found
500Internal Server ErrorError during URL processing

Error Response Format

{
  "detail": "Invalid input: URL is required"
}

Error Examples

{ "detail": "Invalid input: URL is required" }
{ "detail": "Invalid authentication credentials" }

Document Processing

After a successful request, Graphor begins fetching and scraping the web page in the background.

Processing Stages

  1. URL Accepted - The request is validated and scheduled
  2. Content Retrieval - The page is fetched over HTTPS
  3. Text Extraction - Visible text is extracted and normalized
  4. Structure Recognition - Document elements are identified and classified
  5. Ready for Use - Document is available for chunking and retrieval

Processing Methods

The system selects the optimal processing method based on the detected content. You can reprocess with a different method after ingestion.
You can reprocess sources using the Process Source endpoint after ingestion.

Best Practices

  • Provide reachable URLs: Ensure the page is publicly accessible over HTTPS
  • Disable crawling when unneeded: Set crawlUrls to false to ingest only the provided URL
  • Respect site policies: Only scrape pages you are permitted to and consider website rate limits
  • Retry logic: Implement retries for transient network issues

Upload from GitHub

Use this endpoint to ingest content directly from a public GitHub repository into your Graphor project.

Endpoint Overview

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKEN✅ Yes
Content-Typeapplication/json✅ Yes

Request Body

Send a JSON body with the following field:
FieldTypeDescriptionRequired
urlstringThe GitHub repository URL to ingest (e.g., https://github.com/org/repo)✅ Yes

Repository Requirements

  • Public GitHub repositories
  • HTTPS URLs (https://github.com/...)
  • Only public repositories are supported via this endpoint
  • Private repository ingestion is not supported

Response Format

Success Response (200 OK)

{
  "status": "Processing",
  "message": "Source processed successfully",
  "file_id": "file_abc123",
  "file_name": "https://github.com/org/repo",
  "file_size": 0,
  "file_type": "",
  "file_source": "github",
  "project_id": "550e8400-e29b-41d4-a716-446655440000",
  "project_name": "My Project",
  "partition_method": "basic"
}

Response Fields

FieldTypeDescription
statusstringProcessing status (New, Processing, Completed, Failed, etc.)
messagestringHuman-readable status message
file_idstringUnique identifier for the source (use this for subsequent API calls)
file_namestringThe repository URL
file_sizeintegerSize in bytes (0 for initial GitHub record)
file_typestringDetected file type (when applicable)
file_sourcestringSource type (github)
project_idstringUUID of the project
project_namestringName of the project
partition_methodstringDocument processing method used

Code Examples

JavaScript/Node.js

import fetch from 'node-fetch';

const uploadGithubSource = async (apiToken, repoUrl) => {
  const response = await fetch('https://sources.graphorlm.com/upload-github-source', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url: repoUrl })
  });

  if (!response.ok) {
    throw new Error(`GitHub upload failed: ${response.status} ${response.statusText}`);
  }

  const result = await response.json();
  console.log('GitHub upload accepted:', result);
  return result;
};

// Usage
uploadGithubSource('grlm_your_api_token_here', 'https://github.com/org/repo');

Python

import requests

def upload_github_source(api_token, repo_url):
    endpoint = "https://sources.graphorlm.com/upload-github-source"
    headers = {"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"}
    payload = {"url": repo_url}

    response = requests.post(endpoint, headers=headers, json=payload, timeout=300)
    response.raise_for_status()
    return response.json()

# Usage
result = upload_github_source("grlm_your_api_token_here", "https://github.com/org/repo")
print("GitHub upload accepted:", result["file_name"])  # echoes the repo URL

cURL

curl -X POST https://sources.graphorlm.com/upload-github-source \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://github.com/org/repo"}'

Error Responses

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid or missing URL, malformed JSON
401UnauthorizedInvalid or missing API token
403ForbiddenAccess denied to the specified project
404Not FoundProject or source not found
500Internal Server ErrorError during repository processing

Error Response Format

{
  "detail": "Invalid input: URL is required"
}

Error Examples

{ "detail": "Invalid input: URL is required" }
{ "detail": "Invalid authentication credentials" }

Document Processing

After a successful request, Graphor begins processing the GitHub source in the background.

Processing Stages

  1. Request Accepted - The request is validated and scheduled
  2. Repository Fetch - Repository content is retrieved
  3. Text Extraction - Content is extracted and normalized
  4. Structure Recognition - Document elements are identified and classified
  5. Ready for Use - Content is available for chunking and retrieval
You can reprocess sources using the Process Source endpoint after ingestion.

Best Practices

  • Provide valid repository URLs: Use the canonical HTTPS GitHub URL
  • Public repositories only: Private repositories are not supported
  • Retry logic: Implement retries for transient network issues

Upload from YouTube

Use this endpoint to ingest content from a public YouTube video URL into your Graphor project.

Endpoint Overview

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKEN✅ Yes
Content-Typeapplication/json✅ Yes

Request Body

Send a JSON body with the following field:
FieldTypeDescriptionRequired
urlstringThe YouTube video URL to ingest (e.g., https://www.youtube.com/watch?v=...)✅ Yes

Video Requirements

  • Public YouTube video URLs (HTTPS)
  • The video must be publicly accessible
  • Private or access-restricted videos are not supported

Response Format

Success Response (200 OK)

{
  "status": "Processing",
  "message": "Source processed successfully",
  "file_id": "file_abc123",
  "file_name": "https://www.youtube.com/watch?v=VIDEO_ID",
  "file_size": 0,
  "file_type": "",
  "file_source": "youtube",
  "project_id": "550e8400-e29b-41d4-a716-446655440000",
  "project_name": "My Project",
  "partition_method": "basic"
}

Response Fields

FieldTypeDescription
statusstringProcessing status (New, Processing, Completed, Failed, etc.)
messagestringHuman-readable status message
file_idstringUnique identifier for the source (use this for subsequent API calls)
file_namestringThe video URL (echoed back)
file_sizeintegerSize in bytes (0 for URL-based initial record)
file_typestringDetected type (when applicable)
file_sourcestringSource type (youtube)
project_idstringUUID of the project
project_namestringName of the project
partition_methodstringDocument processing method used

Code Examples

JavaScript/Node.js

import fetch from 'node-fetch';

const uploadYoutubeSource = async (apiToken, videoUrl) => {
  const response = await fetch('https://sources.graphorlm.com/upload-youtube-source', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url: videoUrl })
  });

  if (!response.ok) {
    throw new Error(`YouTube upload failed: ${response.status} ${response.statusText}`);
  }

  return response.json();
};

// Usage
uploadYoutubeSource('grlm_your_api_token_here', 'https://www.youtube.com/watch?v=VIDEO_ID');

Python

import requests

def upload_youtube_source(api_token, video_url):
    endpoint = "https://sources.graphorlm.com/upload-youtube-source"
    headers = {"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"}
    payload = {"url": video_url}

    response = requests.post(endpoint, headers=headers, json=payload, timeout=300)
    response.raise_for_status()
    return response.json()

# Usage
result = upload_youtube_source("grlm_your_api_token_here", "https://www.youtube.com/watch?v=VIDEO_ID")
print("YouTube ingestion accepted:", result["file_name"])

cURL

curl -X POST https://sources.graphorlm.com/upload-youtube-source \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://www.youtube.com/watch?v=VIDEO_ID"}'

Error Responses

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid or missing URL, malformed JSON
401UnauthorizedInvalid or missing API token
403ForbiddenAccess denied to the specified project
404Not FoundProject or source not found
500Internal Server ErrorError during video processing

Error Response Format

{
  "detail": "Invalid input: URL is required"
}

Document Processing

After a successful request, Graphor begins processing the YouTube source.

Processing Stages

  1. Request Accepted - The request is validated and scheduled
  2. Content Retrieval - The video is fetched
  3. Transcription / Text Extraction - Audio is transcribed and normalized
  4. Structure Recognition - Content is segmented and classified
  5. Ready for Use - Content is available for chunking and retrieval
You can reprocess sources using the Process Source endpoint after ingestion.

Best Practices

  • Prefer clear audio: Better audio quality improves transcription accuracy
  • Keep URLs stable: Use the canonical YouTube URL format when possible
  • Retry logic: Implement retries for transient network issues

Next Steps

After successfully uploading your documents: