Upload Sources - Graphor Docs

This page documents how to upload content to your Graphor project using the Python SDK — whether that content is a local file, a public web page URL, a public GitHub repository, or a public YouTube video.

Available Methods

Upload a File

client.sources.upload()Upload local files (PDF, DOCX, images, audio, video, etc.)

Upload Web Page

client.sources.upload_url()Scrape and ingest a public web page by URL

Upload from GitHub

client.sources.upload_github()Ingest content from a public GitHub repository

Upload from YouTube

client.sources.upload_youtube()Ingest content from a public YouTube video URL

Installation

Install the Graphor SDK from PyPI:

pip install graphor

Python 3.9 or higher is required.

Authentication

All SDK methods require authentication using an API key. You can provide your API key in two ways:

Environment Variable (Recommended)

Set the GRAPHOR_API_KEY environment variable:

export GRAPHOR_API_KEY="grlm_your_api_key_here"

Then initialize the client without any arguments:

from graphor import Graphor

client = Graphor()

Direct Initialization

Pass the API key directly to the client:

from graphor import Graphor

client = Graphor(api_key="grlm_your_api_key_here")

Never hardcode API keys in your source code. Use environment variables or a secrets manager.

Learn how to create and manage API tokens in the API Tokens guide.

Upload a File

Upload a local file to your Graphor project for processing.

Method Signature

client.sources.upload(
    file: FileTypes,                              # Required
    partition_method: PartitionMethod | None = None,  # Optional
    timeout: float | None = None
) -> PublicSource

Parameters

Parameter	Type	Description	Required
`file`	`FileTypes`	The file to upload. Accepts `bytes`, `Path`, or tuple `(filename, contents, media_type)`	✅ Yes
`partition_method`	`PartitionMethod`	Processing method to use (see Partition Methods below)	No
`timeout`	`float`	Request timeout in seconds (default: 60)	No

Partition Methods

When provided, the partition_method parameter allows you to process/parse the document immediately during upload. If not provided, the system uses the default method.

Value	Name	Description
`"basic"`	Fast	Fast processing with heuristic classification. No OCR.
`"hi_res"`	Balanced	OCR-based extraction with AI-powered structure classification.
`"hi_res_ft"`	Accurate	Fine-tuned AI model for highest accuracy (Premium).
`"mai"`	VLM	Best text-first parsing for manuscripts and handwritten documents.
`"graphorlm"`	Agentic	Highest parsing setting for complex layouts, multi-page tables, and diagrams.

For more details about processing methods, see the Parse Source documentation.

File Requirements

Supported File Types

Graphor supports a wide range of document formats:Documents: PDF, DOC, DOCX, TXT, TEXT, MD, HTML, HTMPresentations: PPT, PPTXSpreadsheets: CSV, TSV, XLS, XLSXImages: PNG, JPG, JPEG, TIFF, BMP, HEICAudio: MP3, WAV, M4A, OGG, FLACVideo: MP4, MOV, AVI, MKV, WEBM

File Size Limits

Maximum file size: 100MB per fileFor larger files, consider:

Compressing the file if possible
Splitting large documents into smaller sections
Using file optimization tools before upload

Response Object

The method returns a PublicSource object with the following properties:

Property	Type	Description
`status`	`str`	Processing status (`New`, `Processing`, `Completed`, `Failed`)
`message`	`str`	Human-readable success message
`file_id`	`str \| None`	Unique identifier for the source (use this for subsequent API calls)
`file_name`	`str`	Name of the uploaded file
`file_size`	`int`	Size of the file in bytes
`file_type`	`str`	File extension/type
`file_source`	`str`	Source type (`local file`)
`project_id`	`str`	UUID of the target project
`project_name`	`str`	Name of the target project
`partition_method`	`str \| None`	Document processing method used

Code Examples

Upload from File Path

from pathlib import Path
from graphor import Graphor

client = Graphor()

# Upload using a Path object
source = client.sources.upload(
    file=Path("./document.pdf")
)

print(f"Uploaded: {source.file_name}")
print(f"File ID: {source.file_id}")
print(f"Status: {source.status}")
print(f"Project ID: {source.project_id}")

Upload with Partition Method

from pathlib import Path
from graphor import Graphor

client = Graphor()

# Upload and process with a specific partition method
source = client.sources.upload(
    file=Path("./document.pdf"),
    partition_method="hi_res"  # Use Balanced processing
)

print(f"Uploaded: {source.file_name}")
print(f"File ID: {source.file_id}")
print(f"Partition Method: {source.partition_method}")

Upload from Bytes

from graphor import Graphor

client = Graphor()

# Read file content as bytes
with open("document.pdf", "rb") as f:
    file_content = f.read()

# Upload raw bytes with filename tuple
source = client.sources.upload(
    file=("document.pdf", file_content, "application/pdf")
)

print(f"Uploaded: {source.file_name}")

Async Upload

import asyncio
from pathlib import Path
from graphor import AsyncGraphor

async def upload_document():
    client = AsyncGraphor()
    
    source = await client.sources.upload(
        file=Path("./document.pdf")
    )
    
    print(f"Uploaded: {source.file_name}")
    print(f"Status: {source.status}")
    
    return source

# Run the async function
asyncio.run(upload_document())

Batch Upload

from pathlib import Path
from graphor import Graphor

client = Graphor()

def batch_upload(directory: str):
    """Upload all supported documents from a directory."""
    supported_extensions = {'.pdf', '.doc', '.docx', '.txt', '.md', '.html'}
    uploaded_files = []
    failed_files = []
    
    for file_path in Path(directory).iterdir():
        if file_path.suffix.lower() in supported_extensions:
            try:
                source = client.sources.upload(file=file_path)
                uploaded_files.append(source.file_name)
                print(f"✅ Uploaded: {file_path.name}")
            except Exception as e:
                failed_files.append((file_path.name, str(e)))
                print(f"❌ Failed: {file_path.name} - {e}")
    
    print(f"\nSummary: {len(uploaded_files)} uploaded, {len(failed_files)} failed")
    return uploaded_files, failed_files

# Usage
uploaded, failed = batch_upload("./documents/")

Error Handling

import graphor
from graphor import Graphor

client = Graphor()

try:
    source = client.sources.upload(
        file=Path("./document.pdf")
    )
    print(f"Upload successful: {source.file_name}")
except graphor.BadRequestError as e:
    print(f"Invalid file type or request: {e}")
except graphor.AuthenticationError as e:
    print(f"Invalid API key: {e}")
except graphor.RateLimitError as e:
    print(f"Rate limit exceeded. Please wait and retry: {e}")
except graphor.APIConnectionError as e:
    print(f"Connection error: {e}")
except graphor.APIStatusError as e:
    print(f"API error (status {e.status_code}): {e}")

Upload Web Page

Ingest content by scraping a public web page.

Method Signature

client.sources.upload_url(
    url: str,                                     # Required
    crawl_urls: bool = False,
    partition_method: PartitionMethod | None = None,  # Optional
    timeout: float | None = None
) -> PublicSource

Parameters

Parameter	Type	Description	Required
`url`	`str`	The URL of the web page to scrape	✅ Yes
`crawl_urls`	`bool`	Whether to crawl and ingest links from the given URL (default: `False`)	No
`partition_method`	`PartitionMethod`	Processing method to use (see Partition Methods above)	No
`timeout`	`float`	Request timeout in seconds	No

URL Requirements

Supported URL types

Public web pages
Pages that render primary content server-side and are reachable without interaction

Access requirements

The URL must be publicly reachable over HTTPS
Authentication-protected pages are not supported

Code Examples

Basic Web Page Upload

from graphor import Graphor

client = Graphor()

# Scrape a single web page
source = client.sources.upload_url(
    url="https://example.com/article"
)

print(f"Ingested: {source.file_name}")
print(f"Status: {source.status}")

Upload with Web Page Crawling

from graphor import Graphor

client = Graphor()

# Scrape a page and follow links
source = client.sources.upload_url(
    url="https://example.com/documentation",
    crawl_urls=True
)

print(f"Ingested: {source.file_name}")

Upload with Partition Method

from graphor import Graphor

client = Graphor()

# Scrape a web page with a specific processing method
source = client.sources.upload_url(
    url="https://example.com/article",
    partition_method="hi_res"  # Use Balanced processing
)

print(f"Ingested: {source.file_name}")
print(f"File ID: {source.file_id}")
print(f"Partition Method: {source.partition_method}")

Async Web Page Upload

import asyncio
from graphor import AsyncGraphor

async def ingest_webpage(url: str):
    client = AsyncGraphor()
    
    source = await client.sources.upload_url(
        url=url,
        crawl_urls=False
    )
    
    print(f"Ingested: {source.file_name}")
    return source

asyncio.run(ingest_webpage("https://example.com"))

Error Handling

import graphor
from graphor import Graphor

client = Graphor()

try:
    source = client.sources.upload_url(
        url="https://example.com/article"
    )
    print(f"URL ingested: {source.file_name}")
except graphor.BadRequestError as e:
    print(f"Invalid URL: {e}")
except graphor.APIStatusError as e:
    print(f"Failed to process URL (status {e.status_code}): {e}")

Upload from GitHub

Ingest content from a public GitHub repository.

Method Signature

client.sources.upload_github(
    url: str,              # Required
    timeout: float | None = None
) -> PublicSource

Parameters

Parameter	Type	Description	Required
`url`	`str`	The GitHub repository URL (e.g., `https://github.com/org/repo`)	✅ Yes
`timeout`	`float`	Request timeout in seconds	No

Repository Requirements

Supported URLs

Public GitHub repositories
HTTPS URLs (https://github.com/...)

Access requirements

Only public repositories are supported
Private repository ingestion is not supported

Code Examples

Basic GitHub Upload

from graphor import Graphor

client = Graphor()

# Ingest a public GitHub repository
source = client.sources.upload_github(
    url="https://github.com/organization/repository"
)

print(f"Ingested: {source.file_name}")
print(f"Status: {source.status}")
print(f"Project: {source.project_name}")

Async GitHub Upload

import asyncio
from graphor import AsyncGraphor

async def ingest_github_repo(repo_url: str):
    client = AsyncGraphor()
    
    source = await client.sources.upload_github(url=repo_url)
    
    print(f"Repository ingested: {source.file_name}")
    return source

asyncio.run(ingest_github_repo("https://github.com/org/repo"))

Error Handling

import graphor
from graphor import Graphor

client = Graphor()

try:
    source = client.sources.upload_github(
        url="https://github.com/org/repo"
    )
    print(f"GitHub repo ingested: {source.file_name}")
except graphor.BadRequestError as e:
    print(f"Invalid GitHub URL: {e}")
except graphor.NotFoundError as e:
    print(f"Repository not found or not accessible: {e}")
except graphor.APIStatusError as e:
    print(f"Failed to process repository (status {e.status_code}): {e}")

Upload from YouTube

Ingest content from a public YouTube video.

Method Signature

client.sources.upload_youtube(
    url: str,              # Required
    timeout: float | None = None
) -> PublicSource

Parameters

Parameter	Type	Description	Required
`url`	`str`	The YouTube video URL (e.g., `https://www.youtube.com/watch?v=...`)	✅ Yes
`timeout`	`float`	Request timeout in seconds	No

Video Requirements

Supported URLs

Public YouTube video URLs (HTTPS)
Standard watch URLs (https://www.youtube.com/watch?v=VIDEO_ID)

Access requirements

The video must be publicly accessible
Private or access-restricted videos are not supported

Code Examples

Basic YouTube Upload

from graphor import Graphor

client = Graphor()

# Ingest a YouTube video
source = client.sources.upload_youtube(
    url="https://www.youtube.com/watch?v=VIDEO_ID"
)

print(f"Ingested: {source.file_name}")
print(f"Status: {source.status}")

Async YouTube Upload

import asyncio
from graphor import AsyncGraphor

async def ingest_youtube_video(video_url: str):
    client = AsyncGraphor()
    
    source = await client.sources.upload_youtube(url=video_url)
    
    print(f"Video ingested: {source.file_name}")
    return source

asyncio.run(ingest_youtube_video("https://www.youtube.com/watch?v=VIDEO_ID"))

Error Handling

import graphor
from graphor import Graphor

client = Graphor()

try:
    source = client.sources.upload_youtube(
        url="https://www.youtube.com/watch?v=VIDEO_ID"
    )
    print(f"YouTube video ingested: {source.file_name}")
except graphor.BadRequestError as e:
    print(f"Invalid YouTube URL: {e}")
except graphor.NotFoundError as e:
    print(f"Video not found or not accessible: {e}")
except graphor.APIStatusError as e:
    print(f"Failed to process video (status {e.status_code}): {e}")

Advanced Configuration

Custom Timeout

For large files or slow connections, you can increase the timeout:

from graphor import Graphor

client = Graphor(
    timeout=300.0  # 5 minutes
)

# Or per-request
source = client.with_options(timeout=300.0).sources.upload(
    file=Path("./large-document.pdf")
)

Retry Configuration

Configure automatic retries for transient errors:

from graphor import Graphor

client = Graphor(
    max_retries=5  # Default is 2
)

# Or per-request
source = client.with_options(max_retries=5).sources.upload(
    file=Path("./document.pdf")
)

Accessing Raw Response

Access headers and other response metadata:

from graphor import Graphor

client = Graphor()

response = client.sources.with_raw_response.upload(
    file=Path("./document.pdf")
)

print(f"Headers: {response.headers}")
source = response.parse()  # Get the PublicSource object
print(f"Uploaded: {source.file_name}")

Using aiohttp for Better Concurrency

For high-concurrency async operations:

import asyncio
from graphor import AsyncGraphor, DefaultAioHttpClient

async def upload_many_files(file_paths: list):
    async with AsyncGraphor(
        http_client=DefaultAioHttpClient()
    ) as client:
        tasks = [
            client.sources.upload(file=path)
            for path in file_paths
        ]
        results = await asyncio.gather(*tasks, return_exceptions=True)
        return results

# Install aiohttp first: pip install graphor[aiohttp]

Error Reference

Error Type	Status Code	Description
`BadRequestError`	400	Invalid file type, missing filename, or malformed request
`AuthenticationError`	401	Invalid or missing API key
`PermissionDeniedError`	403	Access denied to the specified project
`NotFoundError`	404	Project or source not found
`RateLimitError`	429	Too many requests, please retry after waiting
`InternalServerError`	≥500	Server-side processing error
`APIConnectionError`	N/A	Network connectivity issues
`APITimeoutError`	N/A	Request timed out

Next Steps

After successfully uploading your documents:

Parse Source

Reprocess documents with different parsing methods for optimal results

List Sources

Retrieve information about all uploaded documents in your project

List Parse Results

Retrieve structured elements and partitions from processed documents

Delete Source

Remove documents that are no longer needed from your project

Get Started

Data SDK Options

​Available Methods

Upload a File

Upload Web Page

Upload from GitHub

Upload from YouTube

​Installation

​Authentication

​Environment Variable (Recommended)

​Direct Initialization

​Upload a File

​Method Signature

​Parameters

​Partition Methods

​File Requirements

​Response Object

​Code Examples

​Upload from File Path

​Upload with Partition Method

​Upload from Bytes

​Async Upload

​Batch Upload

​Error Handling

​Upload Web Page

​Method Signature

​Parameters

​URL Requirements

​Code Examples

​Basic Web Page Upload

​Upload with Web Page Crawling

​Upload with Partition Method

​Async Web Page Upload

​Error Handling

​Upload from GitHub

​Method Signature

​Parameters

​Repository Requirements

​Code Examples

​Basic GitHub Upload

​Async GitHub Upload

​Error Handling

​Upload from YouTube

​Method Signature

​Parameters

​Video Requirements

​Code Examples

​Basic YouTube Upload

​Async YouTube Upload

​Error Handling

​Advanced Configuration

​Custom Timeout

​Retry Configuration

​Accessing Raw Response

​Using aiohttp for Better Concurrency

​Error Reference

​Next Steps

Parse Source

List Sources

List Parse Results

Delete Source

Available Methods

Installation

Authentication

Environment Variable (Recommended)

Direct Initialization

Upload a File

Method Signature

Parameters

Partition Methods

File Requirements

Response Object

Code Examples

Upload from File Path

Upload with Partition Method

Upload from Bytes

Async Upload

Batch Upload

Error Handling

Upload Web Page

Method Signature

Parameters

URL Requirements

Code Examples

Basic Web Page Upload

Upload with Web Page Crawling

Upload with Partition Method

Async Web Page Upload

Error Handling

Upload from GitHub

Method Signature

Parameters

Repository Requirements

Code Examples

Basic GitHub Upload

Async GitHub Upload

Error Handling

Upload from YouTube

Method Signature

Parameters

Video Requirements

Code Examples

Basic YouTube Upload

Async YouTube Upload

Error Handling

Advanced Configuration

Custom Timeout

Retry Configuration

Accessing Raw Response

Using aiohttp for Better Concurrency

Error Reference

Next Steps