Skip to main content
This page documents how to upload content to your Graphor project using the Python SDK — whether that content is a local file, a public web page URL, a public GitHub repository, or a public YouTube video.

Available Methods

Installation

Install the Graphor SDK from PyPI:
pip install graphor
Python 3.9 or higher is required.

Authentication

All SDK methods require authentication using an API key. You can provide your API key in two ways: Set the GRAPHOR_API_KEY environment variable:
export GRAPHOR_API_KEY="grlm_your_api_key_here"
Then initialize the client without any arguments:
from graphor import Graphor

client = Graphor()

Direct Initialization

Pass the API key directly to the client:
from graphor import Graphor

client = Graphor(api_key="grlm_your_api_key_here")
Never hardcode API keys in your source code. Use environment variables or a secrets manager.
Learn how to create and manage API tokens in the API Tokens guide.

Upload a File

Upload a local file to your Graphor project for processing.

Method Signature

client.sources.upload(
    file: FileTypes,                              # Required
    partition_method: PartitionMethod | None = None,  # Optional
    timeout: float | None = None
) -> PublicSource

Parameters

ParameterTypeDescriptionRequired
fileFileTypesThe file to upload. Accepts bytes, Path, or tuple (filename, contents, media_type)✅ Yes
partition_methodPartitionMethodProcessing method to use (see Partition Methods below)No
timeoutfloatRequest timeout in seconds (default: 60)No

Partition Methods

When provided, the partition_method parameter allows you to process/parse the document immediately during upload. If not provided, the system uses the default method.
ValueNameDescription
"basic"FastFast processing with heuristic classification. No OCR.
"hi_res"BalancedOCR-based extraction with AI-powered structure classification.
"hi_res_ft"AccurateFine-tuned AI model for highest accuracy (Premium).
"mai"VLMBest text-first parsing for manuscripts and handwritten documents.
"graphorlm"AgenticHighest parsing setting for complex layouts, multi-page tables, and diagrams.
For more details about processing methods, see the Parse Source documentation.

File Requirements

Graphor supports a wide range of document formats:Documents: PDF, DOC, DOCX, TXT, TEXT, MD, HTML, HTMPresentations: PPT, PPTXSpreadsheets: CSV, TSV, XLS, XLSXImages: PNG, JPG, JPEG, TIFF, BMP, HEICAudio: MP3, WAV, M4A, OGG, FLACVideo: MP4, MOV, AVI, MKV, WEBM
Maximum file size: 100MB per fileFor larger files, consider:
  • Compressing the file if possible
  • Splitting large documents into smaller sections
  • Using file optimization tools before upload

Response Object

The method returns a PublicSource object with the following properties:
PropertyTypeDescription
statusstrProcessing status (New, Processing, Completed, Failed)
messagestrHuman-readable success message
file_idstr | NoneUnique identifier for the source (use this for subsequent API calls)
file_namestrName of the uploaded file
file_sizeintSize of the file in bytes
file_typestrFile extension/type
file_sourcestrSource type (local file)
project_idstrUUID of the target project
project_namestrName of the target project
partition_methodstr | NoneDocument processing method used

Code Examples

Upload from File Path

from pathlib import Path
from graphor import Graphor

client = Graphor()

# Upload using a Path object
source = client.sources.upload(
    file=Path("./document.pdf")
)

print(f"Uploaded: {source.file_name}")
print(f"File ID: {source.file_id}")
print(f"Status: {source.status}")
print(f"Project ID: {source.project_id}")

Upload with Partition Method

from pathlib import Path
from graphor import Graphor

client = Graphor()

# Upload and process with a specific partition method
source = client.sources.upload(
    file=Path("./document.pdf"),
    partition_method="hi_res"  # Use Balanced processing
)

print(f"Uploaded: {source.file_name}")
print(f"File ID: {source.file_id}")
print(f"Partition Method: {source.partition_method}")

Upload from Bytes

from graphor import Graphor

client = Graphor()

# Read file content as bytes
with open("document.pdf", "rb") as f:
    file_content = f.read()

# Upload raw bytes with filename tuple
source = client.sources.upload(
    file=("document.pdf", file_content, "application/pdf")
)

print(f"Uploaded: {source.file_name}")

Async Upload

import asyncio
from pathlib import Path
from graphor import AsyncGraphor

async def upload_document():
    client = AsyncGraphor()
    
    source = await client.sources.upload(
        file=Path("./document.pdf")
    )
    
    print(f"Uploaded: {source.file_name}")
    print(f"Status: {source.status}")
    
    return source

# Run the async function
asyncio.run(upload_document())

Batch Upload

from pathlib import Path
from graphor import Graphor

client = Graphor()

def batch_upload(directory: str):
    """Upload all supported documents from a directory."""
    supported_extensions = {'.pdf', '.doc', '.docx', '.txt', '.md', '.html'}
    uploaded_files = []
    failed_files = []
    
    for file_path in Path(directory).iterdir():
        if file_path.suffix.lower() in supported_extensions:
            try:
                source = client.sources.upload(file=file_path)
                uploaded_files.append(source.file_name)
                print(f"✅ Uploaded: {file_path.name}")
            except Exception as e:
                failed_files.append((file_path.name, str(e)))
                print(f"❌ Failed: {file_path.name} - {e}")
    
    print(f"\nSummary: {len(uploaded_files)} uploaded, {len(failed_files)} failed")
    return uploaded_files, failed_files

# Usage
uploaded, failed = batch_upload("./documents/")

Error Handling

import graphor
from graphor import Graphor

client = Graphor()

try:
    source = client.sources.upload(
        file=Path("./document.pdf")
    )
    print(f"Upload successful: {source.file_name}")
except graphor.BadRequestError as e:
    print(f"Invalid file type or request: {e}")
except graphor.AuthenticationError as e:
    print(f"Invalid API key: {e}")
except graphor.RateLimitError as e:
    print(f"Rate limit exceeded. Please wait and retry: {e}")
except graphor.APIConnectionError as e:
    print(f"Connection error: {e}")
except graphor.APIStatusError as e:
    print(f"API error (status {e.status_code}): {e}")

Upload Web Page

Ingest content by scraping a public web page.

Method Signature

client.sources.upload_url(
    url: str,                                     # Required
    crawl_urls: bool = False,
    partition_method: PartitionMethod | None = None,  # Optional
    timeout: float | None = None
) -> PublicSource

Parameters

ParameterTypeDescriptionRequired
urlstrThe URL of the web page to scrape✅ Yes
crawl_urlsboolWhether to crawl and ingest links from the given URL (default: False)No
partition_methodPartitionMethodProcessing method to use (see Partition Methods above)No
timeoutfloatRequest timeout in secondsNo

URL Requirements

  • Public web pages
  • Pages that render primary content server-side and are reachable without interaction
  • The URL must be publicly reachable over HTTPS
  • Authentication-protected pages are not supported

Code Examples

Basic Web Page Upload

from graphor import Graphor

client = Graphor()

# Scrape a single web page
source = client.sources.upload_url(
    url="https://example.com/article"
)

print(f"Ingested: {source.file_name}")
print(f"Status: {source.status}")

Upload with Web Page Crawling

from graphor import Graphor

client = Graphor()

# Scrape a page and follow links
source = client.sources.upload_url(
    url="https://example.com/documentation",
    crawl_urls=True
)

print(f"Ingested: {source.file_name}")

Upload with Partition Method

from graphor import Graphor

client = Graphor()

# Scrape a web page with a specific processing method
source = client.sources.upload_url(
    url="https://example.com/article",
    partition_method="hi_res"  # Use Balanced processing
)

print(f"Ingested: {source.file_name}")
print(f"File ID: {source.file_id}")
print(f"Partition Method: {source.partition_method}")

Async Web Page Upload

import asyncio
from graphor import AsyncGraphor

async def ingest_webpage(url: str):
    client = AsyncGraphor()
    
    source = await client.sources.upload_url(
        url=url,
        crawl_urls=False
    )
    
    print(f"Ingested: {source.file_name}")
    return source

asyncio.run(ingest_webpage("https://example.com"))

Error Handling

import graphor
from graphor import Graphor

client = Graphor()

try:
    source = client.sources.upload_url(
        url="https://example.com/article"
    )
    print(f"URL ingested: {source.file_name}")
except graphor.BadRequestError as e:
    print(f"Invalid URL: {e}")
except graphor.APIStatusError as e:
    print(f"Failed to process URL (status {e.status_code}): {e}")

Upload from GitHub

Ingest content from a public GitHub repository.

Method Signature

client.sources.upload_github(
    url: str,              # Required
    timeout: float | None = None
) -> PublicSource

Parameters

ParameterTypeDescriptionRequired
urlstrThe GitHub repository URL (e.g., https://github.com/org/repo)✅ Yes
timeoutfloatRequest timeout in secondsNo

Repository Requirements

  • Public GitHub repositories
  • HTTPS URLs (https://github.com/...)
  • Only public repositories are supported
  • Private repository ingestion is not supported

Code Examples

Basic GitHub Upload

from graphor import Graphor

client = Graphor()

# Ingest a public GitHub repository
source = client.sources.upload_github(
    url="https://github.com/organization/repository"
)

print(f"Ingested: {source.file_name}")
print(f"Status: {source.status}")
print(f"Project: {source.project_name}")

Async GitHub Upload

import asyncio
from graphor import AsyncGraphor

async def ingest_github_repo(repo_url: str):
    client = AsyncGraphor()
    
    source = await client.sources.upload_github(url=repo_url)
    
    print(f"Repository ingested: {source.file_name}")
    return source

asyncio.run(ingest_github_repo("https://github.com/org/repo"))

Error Handling

import graphor
from graphor import Graphor

client = Graphor()

try:
    source = client.sources.upload_github(
        url="https://github.com/org/repo"
    )
    print(f"GitHub repo ingested: {source.file_name}")
except graphor.BadRequestError as e:
    print(f"Invalid GitHub URL: {e}")
except graphor.NotFoundError as e:
    print(f"Repository not found or not accessible: {e}")
except graphor.APIStatusError as e:
    print(f"Failed to process repository (status {e.status_code}): {e}")

Upload from YouTube

Ingest content from a public YouTube video.

Method Signature

client.sources.upload_youtube(
    url: str,              # Required
    timeout: float | None = None
) -> PublicSource

Parameters

ParameterTypeDescriptionRequired
urlstrThe YouTube video URL (e.g., https://www.youtube.com/watch?v=...)✅ Yes
timeoutfloatRequest timeout in secondsNo

Video Requirements

  • Public YouTube video URLs (HTTPS)
  • Standard watch URLs (https://www.youtube.com/watch?v=VIDEO_ID)
  • The video must be publicly accessible
  • Private or access-restricted videos are not supported

Code Examples

Basic YouTube Upload

from graphor import Graphor

client = Graphor()

# Ingest a YouTube video
source = client.sources.upload_youtube(
    url="https://www.youtube.com/watch?v=VIDEO_ID"
)

print(f"Ingested: {source.file_name}")
print(f"Status: {source.status}")

Async YouTube Upload

import asyncio
from graphor import AsyncGraphor

async def ingest_youtube_video(video_url: str):
    client = AsyncGraphor()
    
    source = await client.sources.upload_youtube(url=video_url)
    
    print(f"Video ingested: {source.file_name}")
    return source

asyncio.run(ingest_youtube_video("https://www.youtube.com/watch?v=VIDEO_ID"))

Error Handling

import graphor
from graphor import Graphor

client = Graphor()

try:
    source = client.sources.upload_youtube(
        url="https://www.youtube.com/watch?v=VIDEO_ID"
    )
    print(f"YouTube video ingested: {source.file_name}")
except graphor.BadRequestError as e:
    print(f"Invalid YouTube URL: {e}")
except graphor.NotFoundError as e:
    print(f"Video not found or not accessible: {e}")
except graphor.APIStatusError as e:
    print(f"Failed to process video (status {e.status_code}): {e}")

Advanced Configuration

Custom Timeout

For large files or slow connections, you can increase the timeout:
from graphor import Graphor

client = Graphor(
    timeout=300.0  # 5 minutes
)

# Or per-request
source = client.with_options(timeout=300.0).sources.upload(
    file=Path("./large-document.pdf")
)

Retry Configuration

Configure automatic retries for transient errors:
from graphor import Graphor

client = Graphor(
    max_retries=5  # Default is 2
)

# Or per-request
source = client.with_options(max_retries=5).sources.upload(
    file=Path("./document.pdf")
)

Accessing Raw Response

Access headers and other response metadata:
from graphor import Graphor

client = Graphor()

response = client.sources.with_raw_response.upload(
    file=Path("./document.pdf")
)

print(f"Headers: {response.headers}")
source = response.parse()  # Get the PublicSource object
print(f"Uploaded: {source.file_name}")

Using aiohttp for Better Concurrency

For high-concurrency async operations:
import asyncio
from graphor import AsyncGraphor, DefaultAioHttpClient

async def upload_many_files(file_paths: list):
    async with AsyncGraphor(
        http_client=DefaultAioHttpClient()
    ) as client:
        tasks = [
            client.sources.upload(file=path)
            for path in file_paths
        ]
        results = await asyncio.gather(*tasks, return_exceptions=True)
        return results

# Install aiohttp first: pip install graphor[aiohttp]

Error Reference

Error TypeStatus CodeDescription
BadRequestError400Invalid file type, missing filename, or malformed request
AuthenticationError401Invalid or missing API key
PermissionDeniedError403Access denied to the specified project
NotFoundError404Project or source not found
RateLimitError429Too many requests, please retry after waiting
InternalServerError≥500Server-side processing error
APIConnectionErrorN/ANetwork connectivity issues
APITimeoutErrorN/ARequest timed out

Next Steps

After successfully uploading your documents: