Skip to main content
This page documents Graphor’s source ingestion endpoints. All ingestion is asynchronous: you send a request, receive a build ID immediately, and then poll the build status endpoint until processing completes. Use these endpoints to add content to your project — whether that content is a local file, a public web page URL, a public GitHub repository, or a public YouTube video.

Endpoints

Get build status

GET https://sources.graphorlm.com/builds/{build_id}Poll status and optional parsed elements for an async ingestion

Ingest file

POST https://sources.graphorlm.com/ingest-fileUpload a local file; processing runs in the background

Ingest URL

POST https://sources.graphorlm.com/ingest-urlIngest a public web page by URL (async)

Ingest GitHub

POST https://sources.graphorlm.com/ingest-githubIngest a public GitHub repository (async)

Ingest YouTube

POST https://sources.graphorlm.com/ingest-youtubeIngest a public YouTube video (async)

Authentication

All endpoints on this page require authentication using an API token. Include your API token as a Bearer token in the Authorization header.
Learn how to create and manage API tokens in the API Tokens guide.

Async ingestion flow

  1. Call one of the ingest endpoints (file, URL, GitHub, or YouTube). The request is validated and the job is scheduled; the response returns immediately with a build_id.
  2. Poll GET /builds/{build_id} to check status. When status is Completed, the source is ready; when status indicates failure, check the error field.
  3. Use the returned file_id (once the build has completed) for subsequent API calls (ask, extract, retrieve, delete, etc.).
The Get build status endpoint can also return paginated parsed elements (chunks) for a completed build when you do not set suppress_elements=true.

Get build status

Use this endpoint to poll the result of an async ingestion (or re-process). The build_id is returned by:
  • POST /ingest-file
  • POST /ingest-url
  • POST /ingest-github
  • POST /ingest-youtube
  • POST /reprocess (re-process)

Endpoint overview

Path parameter

ParameterTypeDescription
build_idstringThe build identifier returned when the job was scheduled

Query parameters

ParameterTypeDefaultDescription
suppress_elementsbooleanfalseWhen true, elements are omitted from the response
suppress_img_base64booleanfalseWhen true, img_base64 is omitted from each element
pageinteger1-based page number (use with page_size for pagination)
page_sizeintegerNumber of elements per page (max 100)

Success response (200 OK)

When the build has been persisted (history exists), the response includes status and optional metadata:
{
  "build_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "Completed",
  "success": true,
  "file_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "file_name": "report.pdf",
  "error": null,
  "method": "balanced",
  "total_partitions": 42,
  "total_pages": 10,
  "created_at": "2025-03-07T12:00:00Z",
  "updated_at": "2025-03-07T12:01:30Z",
  "message": null,
  "elements": null,
  "total_elements": null,
  "page": null,
  "page_size": null,
  "total_pages_elements": null
}
When the build is pending (request received but build has not started yet):
{
  "build_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "Pending",
  "success": false,
  "file_id": null,
  "file_name": null,
  "error": null,
  "message": "Build is pending; processing has not started yet"
}
When the build is still in progress (running):
{
  "build_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "Processing",
  "success": false,
  "file_id": null,
  "file_name": null,
  "error": null,
  "message": "Build not found or not yet persisted"
}

Response fields

FieldTypeDescription
build_idstringThe requested build identifier
statusstringPending, Processing, Completed, Processing failed, or not_found when no history exists
successbooleantrue only when status is Completed
file_idstring | nullSource file ID; present when the build has been persisted
file_namestring | nullDisplay name of the source; present when persisted
errorstring | nullError message from the pipeline when the build failed
methodstring | nullStrategy used (e.g. fast, balanced, accurate, vlm, agentic)
total_partitionsinteger | nullNumber of partitions; present when history exists
total_pagesinteger | nullTotal pages in the source; present when history exists
created_atstring | nullISO8601 timestamp when the build was created
updated_atstring | nullISO8601 timestamp when the build was last updated
messagestring | nullHuman-readable message (e.g. when status is not_found)
elementsarray | nullParsed elements (chunks) when suppress_elements=false and build completed
total_elementsinteger | nullTotal number of elements (when elements are returned)
pageinteger | nullCurrent page of elements (1-based) when pagination is used
page_sizeinteger | nullElements per page when pagination is used
total_pages_elementsinteger | nullTotal pages of elements when pagination is used

Code example: poll until complete

Poll until success is true. While status is Pending (request received, build not started) or Processing, keep polling. Only treat Processing failed or a non-null error (when status is not not_found) as failure.
const pollBuildStatus = async (apiToken, buildId, options = {}) => {
  const { intervalMs = 2000, maxAttempts = 120 } = options;
  const url = `https://sources.graphorlm.com/builds/${buildId}`;

  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const response = await fetch(url, {
      headers: { Authorization: `Bearer ${apiToken}` },
    });
    const data = await response.json();

    if (data.success) return data;
    if (data.status === "Processing failed" || (data.error && data.status !== "not_found"))
      throw new Error(data.error || data.message);

    await new Promise((r) => setTimeout(r, intervalMs));
  }

  throw new Error("Polling timed out");
};
import time
import requests

def poll_build_status(api_token, build_id, interval_seconds=2, max_attempts=120):
    url = f"https://sources.graphorlm.com/builds/{build_id}"
    headers = {"Authorization": f"Bearer {api_token}"}

    for _ in range(max_attempts):
        response = requests.get(url, headers=headers)
        data = response.json()

        if data.get("success"):
            return data
        if data.get("status") == "Processing failed" or (data.get("error") and data.get("status") != "not_found"):
            raise RuntimeError(data.get("error") or data.get("message", "Build failed"))

        time.sleep(interval_seconds)

    raise TimeoutError("Polling timed out")

Ingest file

Upload a local file and schedule ingestion in the background. The API validates size (max 100 MB) and extension, stores the file, then runs the full pipeline (partitioning, chunking, embedding) asynchronously.

Endpoint overview

Request format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKENYes
Content-Typemultipart/form-dataYes

Request body (multipart/form-data)

FieldTypeDescriptionRequired
fileFileThe document file to uploadYes
methodstringProcessing method: fast, balanced, accurate, vlm, or agentic (see Partition methods)No

Partition methods

When provided, partition_method controls how the document is parsed. If omitted, the system default is used.
ValueNameDescription
fastFastFast processing with heuristic classification. No OCR.
balancedBalancedOCR-based extraction with structure classification.
accurateAccurateFine-tuned model for highest accuracy (Premium).
vlmVLMBest for manuscripts and handwritten content.
agenticAgenticHighest accuracy for complex layouts, tables, and diagrams.
For more details, see the Process Source documentation.

File requirements

Documents: PDF, DOC, DOCX, ODT, TXT, TEXT, MD, HTML, HTM
Presentations: PPT, PPTX
Spreadsheets: CSV, TSV, XLS, XLSX
Images: PNG, JPG, JPEG, TIFF, BMP, HEIC
Audio: MP3, WAV, M4A, OGG, FLAC
Video: MP4, MOV, AVI, MKV, WEBM
Maximum file size: 100 MB per file.
The request must include a Content-Length header so the server can enforce the limit.
The file must have a valid filename with extension; the extension determines allowed processing.

Success response (200 OK)

{
  "build_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "success": true,
  "error": null
}

Response fields

FieldTypeDescription
build_idstringUse this ID to poll Get build status
successbooleanWhether the request was successfully scheduled
errorstring | nullError message if the request was not scheduled successfully

Code examples

JavaScript/Node.js

const ingestFile = async (apiToken, filePath) => {
  const formData = new FormData();
  const fileStream = fs.createReadStream(filePath);
  formData.append("file", fileStream);

  const response = await fetch("https://sources.graphorlm.com/ingest-file", {
    method: "POST",
    headers: { Authorization: `Bearer ${apiToken}` },
    body: formData,
  });

  if (!response.ok) throw new Error(`Ingest failed: ${response.status} ${response.statusText}`);
  const { build_id } = await response.json();
  return build_id;
};

// Usage: get build_id, then poll get_build_status until success
const buildId = await ingestFile("grlm_your_api_token_here", "./document.pdf");
console.log("Build ID:", buildId);

Python

import requests

def ingest_file(api_token, file_path, partition_method=None):
    url = "https://sources.graphorlm.com/ingest-file"
    headers = {"Authorization": f"Bearer {api_token}"}
    with open(file_path, "rb") as f:
        files = {"file": (file_path, f)}
        data = {}
        if partition_method:
            data["method"] = partition_method
        response = requests.post(url, headers=headers, files=files, data=data or None, timeout=300)
    response.raise_for_status()
    return response.json()["build_id"]

# Usage
build_id = ingest_file("grlm_your_api_token_here", "document.pdf")
print("Build ID:", build_id)

cURL

curl -X POST https://sources.graphorlm.com/ingest-file \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -F "file=@document.pdf"

cURL with partition method

curl -X POST https://sources.graphorlm.com/ingest-file \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -F "file=@document.pdf" \
  -F "method=balanced"

Error responses

Status CodeDescription
400Unsupported file type or missing file name
411Missing Content-Length header
413File exceeds 100 MB limit
500Internal server error
Example error body:
{
  "detail": "File type 'exe' is not supported. Allowed types: csv, doc, docx, pdf, txt, ..."
}
{
  "detail": "File size exceeds the maximum allowed limit of 100MB"
}

Ingest URL

Ingest a web page (or multiple pages via crawling) as a source. The job runs in the background; use the returned build_id to poll Get build status. If the URL points to a downloadable file (by extension or Content-Type), the file is downloaded and then processed in the background.

Endpoint overview

HTTP Method

POST

Request format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKENYes
Content-Typeapplication/jsonYes

Request body (JSON)

FieldTypeDescriptionRequired
urlstringThe web page URL to ingestYes
crawlUrlsbooleanWhen true, follow and ingest links found on the page (ignored when URL resolves to a file)No (default: false)
methodstringOne of: fast, balanced, accurate, vlm, agenticNo

Success response (200 OK)

{
  "build_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "success": true,
  "error": null
}

Code examples

JavaScript/Node.js

const ingestUrl = async (apiToken, url, crawlUrls = false) => {
  const response = await fetch("https://sources.graphorlm.com/ingest-url", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ url, crawlUrls }),
  });
  if (!response.ok) throw new Error(`Ingest URL failed: ${response.status}`);
  const { build_id } = await response.json();
  return build_id;
};

Python

import requests

def ingest_url(api_token, url, crawl_urls=False, partition_method=None):
    payload = {"url": url, "crawlUrls": crawl_urls}
    if partition_method:
        payload["method"] = partition_method
    response = requests.post(
        "https://sources.graphorlm.com/ingest-url",
        headers={"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"},
        json=payload,
        timeout=300,
    )
    response.raise_for_status()
    return response.json()["build_id"]

cURL

curl -X POST https://sources.graphorlm.com/ingest-url \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/","crawlUrls":false}'

Error responses

Status CodeDescription
400Unsupported file type detected from a file URL
500Internal server error during URL processing
To ingest local files (PDF, DOCX, etc.), use Ingest file.

Ingest GitHub

Ingest a public GitHub repository as a source. Processing runs in the background; use the returned build_id with Get build status.

Endpoint overview

Request format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKENYes
Content-Typeapplication/jsonYes

Request body (JSON)

FieldTypeDescriptionRequired
urlstringGitHub repository URL (e.g. https://github.com/owner/repo)Yes

Success response (200 OK)

{
  "build_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "success": true,
  "error": null
}

Code examples

JavaScript/Node.js

const ingestGithub = async (apiToken, repoUrl) => {
  const response = await fetch("https://sources.graphorlm.com/ingest-github", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ url: repoUrl }),
  });
  if (!response.ok) throw new Error(`GitHub ingest failed: ${response.status}`);
  return (await response.json()).build_id;
};

Python

import requests

def ingest_github(api_token, repo_url):
    response = requests.post(
        "https://sources.graphorlm.com/ingest-github",
        headers={"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"},
        json={"url": repo_url},
        timeout=300,
    )
    response.raise_for_status()
    return response.json()["build_id"]

cURL

curl -X POST https://sources.graphorlm.com/ingest-github \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://github.com/owner/repo"}'

Error responses

Status CodeDescription
500Internal server error during GitHub processing
Only public repositories are supported.

Ingest YouTube

Ingest a public YouTube video (transcript/captions) as a source. Processing runs in the background; use the returned build_id with Get build status.

Endpoint overview

Request format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKENYes
Content-Typeapplication/jsonYes

Request body (JSON)

FieldTypeDescriptionRequired
urlstringYouTube video URL (e.g. https://www.youtube.com/watch?v=...)Yes

Success response (200 OK)

{
  "build_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "success": true,
  "error": null
}

Code examples

JavaScript/Node.js

const ingestYoutube = async (apiToken, videoUrl) => {
  const response = await fetch("https://sources.graphorlm.com/ingest-youtube", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ url: videoUrl }),
  });
  if (!response.ok) throw new Error(`YouTube ingest failed: ${response.status}`);
  return (await response.json()).build_id;
};

Python

import requests

def ingest_youtube(api_token, video_url):
    response = requests.post(
        "https://sources.graphorlm.com/ingest-youtube",
        headers={"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"},
        json={"url": video_url},
        timeout=300,
    )
    response.raise_for_status()
    return response.json()["build_id"]

cURL

curl -X POST https://sources.graphorlm.com/ingest-youtube \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://www.youtube.com/watch?v=VIDEO_ID"}'

Error responses

Status CodeDescription
500Internal server error during YouTube processing
The video must be public; transcripts/captions are downloaded and processed in the background.

Best practices

  • Poll with backoff: When polling Get build status, use a reasonable interval (e.g. 2–5 seconds) and a timeout to avoid tight loops.
  • Store file_id: Once the build completes (success: true), store file_id for use with ask, extract, retrieve, delete, and list elements.
  • Validate before upload: Check file type and size client-side before calling Ingest file.
  • Protect API tokens: Never expose tokens in client-side code or public repositories; use HTTPS only.

Next steps

After ingestion completes (build status Completed):

Parse source

Re-process a source with a different partition method (async; returns a new build_id)

List sources

List all sources in your project

Get elements

Retrieve parsed elements (chunks) for a source

Delete source

Remove a source from your project