Skip to main content
This page provides a comprehensive overview of the Graphor public REST APIs. It covers the full lifecycle:
  1. Data Ingestion (Sources API): Ingest data (async), poll build status, then list sources and retrieve elements.
  2. Document Chat (Chat API): Ask questions about your ingested documents using natural language.
  3. Data Extraction (Extract API): Extract specific structured data from documents using custom schemas.

Data Ingestion (Sources API)

The Sources API covers the full ingestion lifecycle. Ingestion is asynchronous: ingest endpoints return a build_id; you poll Get build status until the job completes, then use the returned file_id for list, elements, chat, and extraction.

Upload & ingest

Ingest files, URLs, GitHub, or YouTube (async). Poll build status for completion.

Get build status

Poll status and optional parsed elements for an async ingestion or reprocess

Reprocess source

Re-process an existing source with a different partition method (async)

List sources

List all sources with status and metadata (optional filter by file_id)

Get elements

Retrieve parsed elements (chunks) of a source (same format as build status elements)

Delete source

Permanently remove a source from your project

Document Chat (Chat API)

Once your data is ingested, use the Chat API to ask questions:

Chat with Documents

Ask natural language questions about your documents with conversational memory and structured outputs

Data Extraction (Extract API)

Extract specific structured data from your documents using schemas:

Extract Structured Data

Extract structured information from documents using custom schemas and natural language instructions

What “Data Ingestion” includes

  • Ingest: create a new source via ingest-file, ingest-url, ingest-github, or ingest-youtube (async; returns build_id)
  • Build status: poll GET /builds/{build_id} until the job completes; then use file_id for other calls
  • Reprocess: re-run the pipeline on an existing source with a different partition method (async; returns build_id)
  • List: list all sources (optionally filter by file_ids) and monitor status
  • Get elements: retrieve parsed elements (chunks) for a source by file_id (GET with query params)
  • Delete: remove a source by file_id (required)

Authentication

All API endpoints require authentication using API tokens. Include your token in the Authorization header:
Authorization: Bearer YOUR_API_TOKEN
Learn how to generate and manage API tokens in the API Tokens guide.

Token Security

  • Never expose tokens in client-side code or public repositories
  • Use environment variables to store tokens securely
  • Rotate tokens regularly for enhanced security
  • Use different tokens for different environments (dev/staging/prod)

URL Structure

All endpoints use the base https://sources.graphorlm.com.

Sources (Data Ingestion)

  • GET /builds/{build_id} — Poll build status (and optional elements)
  • POST /ingest-file — Upload a file (async)
  • POST /ingest-url — Ingest a web page (async)
  • POST /ingest-github — Ingest a GitHub repo (async)
  • POST /ingest-youtube — Ingest a YouTube video (async)
  • POST /reprocess — Re-process an existing source (async)
  • GET / — List all sources (optional ?file_ids=...)
  • GET /get-elements — Get parsed elements of a source (file_id required)
  • DELETE /delete — Delete a source (JSON body: file_id required)

Chat & Extraction

  • POST /ask-sources — Ask questions about documents (optional file_ids / file_names)
  • POST /run-extraction — Extract structured data (file_ids, user_instruction, output_schema)
  • POST /prebuilt-rag — Retrieve chunks (RAG) without generating an answer

Response Formats

All API responses follow consistent JSON structures with appropriate HTTP status codes:

Success Response Pattern

Many endpoints return resource-specific JSON. Async ingestion endpoints (ingest-file, ingest-url, ingest-github, ingest-youtube, reprocess) return immediately with:
{ "build_id": "uuid", "success": true, "error": null }
Use build_id with GET /builds/ to poll until the job completes. Other endpoints return their own shapes (e.g. paginated items + total, or answer + conversation_id).

Error Response Pattern

{
  "detail": "Descriptive error message explaining what went wrong"
}

Common Status Codes

CodeMeaningUsage
200OKSuccessful GET, POST, PATCH operations
400Bad RequestInvalid parameters or malformed requests
401UnauthorizedInvalid or missing API token
404Not FoundResource doesn’t exist
413Payload Too LargeFile size exceeds limits
500Internal Server ErrorServer-side processing errors

Complete Workflow Example

Here’s the full “happy path”: ingest (async) → poll build status → list / get elements → chat / extract. Use the file_id returned once the build completes for all subsequent calls.

1. Ingest a file (async)

const formData = new FormData();
formData.append('file', fileInput.files[0]);
const ingestResponse = await fetch('https://sources.graphorlm.com/ingest-file', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_TOKEN' },
  body: formData
});
const { build_id } = await ingestResponse.json();
console.log('Build ID:', build_id);

2. Poll build status until complete

async function waitForBuild(apiToken, buildId) {
  while (true) {
    const res = await fetch(`https://sources.graphorlm.com/builds/${buildId}`, {
      headers: { 'Authorization': `Bearer ${apiToken}` }
    });
    const data = await res.json();
    if (data.success) return data;
    if (data.status === 'Processing failed' || (data.error && data.status !== 'not_found')) throw new Error(data.error || data.message);
    await new Promise(r => setTimeout(r, 2000));
  }
}
const build = await waitForBuild('YOUR_API_TOKEN', build_id);
const file_id = build.file_id;
console.log('Ready. file_id:', file_id);

3. List sources (optional)

const listResponse = await fetch('https://sources.graphorlm.com', {
  headers: { 'Authorization': 'Bearer YOUR_API_TOKEN' }
});
const sources = await listResponse.json();
const source = sources.find(s => s.file_id === file_id);
console.log('Status:', source?.status);

4. Get parsed elements

const elementsUrl = `https://sources.graphorlm.com/get-elements?file_id=${encodeURIComponent(file_id)}&page=1&page_size=10`;
const elementsResponse = await fetch(elementsUrl, { headers: { 'Authorization': 'Bearer YOUR_API_TOKEN' } });
const { items, total } = await elementsResponse.json();
console.log('Elements:', total);

5. Ask questions (Chat)

const chatResponse = await fetch('https://sources.graphorlm.com/ask-sources', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_TOKEN', 'Content-Type': 'application/json' },
  body: JSON.stringify({
    question: "What are the main topics in this document?",
    file_ids: [file_id]
  })
});
const { answer } = await chatResponse.json();
console.log('Answer:', answer);

6. Extract structured data

const extractResponse = await fetch('https://sources.graphorlm.com/run-extraction', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_TOKEN', 'Content-Type': 'application/json' },
  body: JSON.stringify({
    file_ids: [file_id],
    user_instruction: "Extract the invoice number and total amount.",
    output_schema: {
      type: "object",
      properties: {
        invoice_number: { type: "string", description: "Invoice ID" },
        total_amount: { type: "number", description: "Total due" }
      },
      required: ["invoice_number", "total_amount"]
    }
  })
});
const { structured_output } = await extractResponse.json();
console.log('Extracted:', structured_output);

Integration Patterns

Minimal Sources (Ingestion) Client

class GraphorClient {
  constructor(apiToken) {
    this.apiToken = apiToken;
    this.headers = {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    };
  }

  async ingestFile(file, partitionMethod) {
    const formData = new FormData();
    formData.append('file', file);
    if (partitionMethod) formData.append('method', partitionMethod);
    const response = await fetch('https://sources.graphorlm.com/ingest-file', {
      method: 'POST',
      headers: { 'Authorization': `Bearer ${this.apiToken}` },
      body: formData
    });
    const data = await response.json();
    return data.build_id;
  }

  async getBuildStatus(buildId) {
    const response = await fetch(`https://sources.graphorlm.com/builds/${buildId}`, { headers: this.headers });
    return await response.json();
  }

  async listSources(fileIds = null) {
    const url = fileIds?.length ? `https://sources.graphorlm.com?${fileIds.map(id => `file_ids=${encodeURIComponent(id)}`).join('&')}` : 'https://sources.graphorlm.com';
    const response = await fetch(url, { headers: this.headers });
    return await response.json();
  }

  async reprocessSource(fileId, partitionMethod = 'fast') {
    const response = await fetch('https://sources.graphorlm.com/reprocess', {
      method: 'POST',
      headers: this.headers,
      body: JSON.stringify({ file_id: fileId, method: partitionMethod })
    });
    const data = await response.json();
    return data.build_id;
  }

  async getElements(fileId, options = {}) {
    const params = new URLSearchParams({ file_id: fileId });
    if (options.page != null) params.set('page', options.page);
    if (options.pageSize != null) params.set('page_size', options.pageSize);
    const response = await fetch(`https://sources.graphorlm.com/get-elements?${params}`, { headers: this.headers });
    return await response.json();
  }

  async deleteSource(fileId) {
    const response = await fetch('https://sources.graphorlm.com/delete', {
      method: 'DELETE',
      headers: this.headers,
      body: JSON.stringify({ file_id: fileId })
    });
    return await response.json();
  }

  async ask(question, fileIds = []) {
    const response = await fetch('https://sources.graphorlm.com/ask-sources', {
      method: 'POST',
      headers: this.headers,
      body: JSON.stringify({ question, file_ids: fileIds })
    });
    return await response.json();
  }

  async extract(fileIds, instruction, outputSchema) {
    const response = await fetch('https://sources.graphorlm.com/run-extraction', {
      method: 'POST',
      headers: this.headers,
      body: JSON.stringify({
        file_ids: fileIds,
        user_instruction: instruction,
        output_schema: outputSchema
      })
    });
    return await response.json();
  }
}

// Usage example
const client = new GraphorClient('YOUR_API_TOKEN');

// Ingestion (async) + Chat + Extraction workflow
async function ingestAndAnalyze() {
  try {
    // 1. Ingest file (async), then poll build status for file_id
    const file = document.getElementById('fileInput').files[0];
    const buildId = await client.ingestFile(file, 'balanced');
    let build;
    while (true) {
      build = await client.getBuildStatus(buildId);
      if (build.success) break;
      if (build.error && build.status !== 'not_found') throw new Error(build.error);
      await new Promise(r => setTimeout(r, 2000));
    }
    const fileId = build.file_id;

    // 2. Chat with the document
    const chatResult = await client.ask("Summarize this document", [fileId]);
    console.log('Chat Answer:', chatResult.answer);

    // 3. Extract structured data
    const extractResult = await client.extract(
      [fileId],
      "Extract key values",
      { type: "object", properties: { summary: { type: "string" } }, required: ["summary"] }
    );
    console.log('Extraction:', extractResult.structured_output);
  } catch (error) {
    console.error('Error:', error);
  }
}

Python Integration

import time
import requests
from typing import Dict, List, Any, Optional
import os

class GraphorLMAPI:
    def __init__(self, api_token: str):
        self.api_token = api_token
        self.headers = {
            "Authorization": f"Bearer {api_token}",
            "Content-Type": "application/json"
        }
    
    # Sources API
    def ingest_file(self, file_path: str, partition_method: Optional[str] = None) -> str:
        """Ingest a file (async). Returns build_id."""
        url = "https://sources.graphorlm.com/ingest-file"
        with open(file_path, "rb") as f:
            files = {"file": (os.path.basename(file_path), f)}
            data = {"method": partition_method} if partition_method else {}
            headers = {"Authorization": f"Bearer {self.api_token}"}
            response = requests.post(url, headers=headers, files=files, data=data)
        response.raise_for_status()
        return response.json()["build_id"]

    def get_build_status(self, build_id: str) -> Dict[str, Any]:
        """Poll build status. Returns dict with success, file_id when done."""
        response = requests.get(f"https://sources.graphorlm.com/builds/{build_id}", headers=self.headers)
        response.raise_for_status()
        return response.json()

    def list_sources(self, file_ids: Optional[List[str]] = None) -> List[Dict[str, Any]]:
        """List all sources (optionally filter by file_ids)."""
        url = "https://sources.graphorlm.com"
        params = {"file_ids": file_ids} if file_ids else {}
        response = requests.get(url, headers=self.headers, params=params)
        response.raise_for_status()
        return response.json()

    def reprocess_source(self, file_id: str, partition_method: str = "fast") -> str:
        """Re-process a source (async). Returns build_id."""
        response = requests.post(
            "https://sources.graphorlm.com/reprocess",
            headers=self.headers,
            json={"file_id": file_id, "method": partition_method},
        )
        response.raise_for_status()
        return response.json()["build_id"]

    def get_elements(self, file_id: str, page: Optional[int] = None, page_size: Optional[int] = None) -> Dict[str, Any]:
        """Get parsed elements for a source."""
        params = {"file_id": file_id}
        if page is not None: params["page"] = page
        if page_size is not None: params["page_size"] = page_size
        response = requests.get("https://sources.graphorlm.com/get-elements", headers=self.headers, params=params)
        response.raise_for_status()
        return response.json()

    def delete_source(self, file_id: str) -> Dict[str, Any]:
        """Delete a source by file_id."""
        response = requests.delete(
            "https://sources.graphorlm.com/delete",
            headers=self.headers,
            json={"file_id": file_id},
        )
        response.raise_for_status()
        return response.json()
    
    # Chat API
    def ask(self, question: str, file_ids: Optional[List[str]] = None) -> Dict[str, Any]:
        """Ask a question about documents."""
        payload = {"question": question}
        if file_ids:
            payload["file_ids"] = file_ids
        response = requests.post("https://sources.graphorlm.com/ask-sources", headers=self.headers, json=payload)
        response.raise_for_status()
        return response.json()

    # Extraction API
    def extract(self, file_ids: List[str], instruction: str, output_schema: Dict[str, Any]) -> Dict[str, Any]:
        """Extract structured data from documents. output_schema is a JSON Schema dict."""
        response = requests.post(
            "https://sources.graphorlm.com/run-extraction",
            headers=self.headers,
            json={"file_ids": file_ids, "user_instruction": instruction, "output_schema": output_schema},
        )
        response.raise_for_status()
        return response.json()

# Usage
api = GraphorLMAPI(os.getenv("GRAPHORLM_API_TOKEN"))

# Ingestion (async), Chat, and Extraction workflow
def analyze_documents(documents: List[str]):
    try:
        for doc_path in documents:
            build_id = api.ingest_file(doc_path, "balanced")
            while True:
                status = api.get_build_status(build_id)
                if status.get("success"):
                    file_id = status["file_id"]
                    break
                if status.get("error") and status.get("status") != "not_found":
                    raise RuntimeError(status.get("error"))
                time.sleep(2)

            chat_answer = api.ask("What is this document about?", [file_id])
            print(f"Chat Answer: {chat_answer['answer']}")

            schema = {"type": "object", "properties": {"summary": {"type": "string"}}, "required": ["summary"]}
            extract_result = api.extract([file_id], "Summarize the document", schema)
            print(f"Extraction: {extract_result['structured_output']}")
        return True
    except Exception as e:
        print(f"Failed: {e}")
        return False

Rate Limits and Best Practices

Performance Guidelines

  • Batch Operations: Group multiple related requests when possible
  • Asynchronous Processing: Use async/await for multiple concurrent requests
  • Retry Logic: Implement exponential backoff for transient failures
  • Caching: Cache frequently accessed data like flow configurations

Error Handling Best Practices

async function robustAPICall(url, options, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(url, options);
      
      if (!response.ok) {
        const errorData = await response.json();
        throw new Error(`HTTP ${response.status}: ${errorData.detail}`);
      }
      
      return await response.json();
    } catch (error) {
      console.warn(`Attempt ${attempt} failed:`, error.message);
      
      if (attempt === maxRetries) {
        throw error;
      }
      
      // Exponential backoff
      await new Promise(resolve => setTimeout(resolve, Math.pow(2, attempt) * 1000));
    }
  }
}

Testing and Development

API Testing Tools

You can test Graphor API endpoints using:
  • cURL: Command-line testing and scripting
  • Postman: Interactive API testing and documentation
  • Bruno/Insomnia: Alternative API clients
  • Custom Scripts: Automated testing suites

Example cURL Commands

# List all sources
curl -X GET "https://sources.graphorlm.com" \
  -H "Authorization: Bearer YOUR_API_TOKEN"

# Ingest a file (async; returns build_id)
curl -X POST "https://sources.graphorlm.com/ingest-file" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -F "file=@document.pdf" \
  -F "method=balanced"

# Get build status (poll until success)
curl -X GET "https://sources.graphorlm.com/builds/BUILD_ID" \
  -H "Authorization: Bearer YOUR_API_TOKEN"

# Reprocess a source (async)
curl -X POST "https://sources.graphorlm.com/reprocess" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_id":"FILE_ID","method":"balanced"}'

# Get parsed elements of a source
curl -X GET "https://sources.graphorlm.com/get-elements?file_id=FILE_ID&page=1&page_size=10" \
  -H "Authorization: Bearer YOUR_API_TOKEN"

# Delete a source
curl -X DELETE "https://sources.graphorlm.com/delete" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_id":"FILE_ID"}'

Common Use Cases

Content Management System Integration

Ingest documents as they’re created/updated in your CMS:
class ContentSearchAPI {
  constructor(apiToken) {
    this.client = new GraphorClient(apiToken);
  }

  async ingestDocument(file, partitionMethod = 'balanced') {
    const buildId = await this.client.ingestFile(file, partitionMethod);
    return { success: true, build_id: buildId };
  }
}

Automated Ingestion Pipeline

Batch ingest research documents (async; each returns a build_id to poll):
class ResearchPipeline:
    def __init__(self, api_token: str):
        self.api = GraphorLMAPI(api_token)

    def ingest_papers(self, paper_paths: List[str]) -> List[str]:
        """Ingest multiple research papers. Returns list of build_ids."""
        build_ids = []
        for paper_path in paper_paths:
            build_id = self.api.ingest_file(paper_path, "balanced")
            build_ids.append(build_id)
        return build_ids

Migration and Versioning

API Versioning

The Graphor API follows semantic versioning principles:
  • Current Version: v1 (stable)
  • Endpoint Paths: Include version in URL structure where applicable
  • Backward Compatibility: Breaking changes will increment major version

Migration Best Practices

  • Monitor API Updates: Subscribe to API changelog notifications
  • Version Pinning: Specify API versions in your integrations
  • Gradual Migration: Test new versions in staging before production deployment
  • Fallback Strategies: Implement graceful degradation for API changes

Support and Resources

Getting Help

Contact Support

Direct support for technical questions and issues

API Tokens Guide

Learn how to generate and manage authentication tokens

Data Ingestion

Best practices for document upload and processing

Flows Overview

Master comprehensive RAG pipeline and node management

Community and Updates

  • Documentation Updates: This documentation is continuously updated with new features
  • API Changelog: Monitor changes and new endpoint releases
  • Best Practices: Learn from community implementations and use cases

Next Steps

Ready to start building with Graphor APIs? Choose your path:

For Beginners

Upload & ingest

Ingest documents from files, URLs, GitHub, or YouTube (async); poll build status for completion

Chat with Documents

Ask natural language questions about your documents with conversational memory and structured outputs

API Tokens

Set up authentication for API access

For Advanced Users

Flows Overview

Master comprehensive RAG pipeline and node management

LLM Integration

Advanced language model configuration and optimization

Advanced RAG

Explore Smart RAG, Graph RAG, and RAPTOR capabilities

Integration Patterns

Build production-ready integrations
The Graphor REST API provides the foundation for building intelligent, document-driven applications. With comprehensive support for advanced RAG implementations, multiple node types (chunking, retrieval, reranking, Smart RAG, Graph RAG, RAPTOR RAG, and LLM), and flexible pipeline management, these APIs give you the power and flexibility to build sophisticated AI workflows that scale from simple document search to complex research analysis systems.