This page documents how to upload content to your Graphor project using the Python SDK — whether that content is a local file , a public web page URL , a public GitHub repository , or a public YouTube video .
Available Methods
Installation
Install the Graphor SDK from PyPI:
Python 3.9 or higher is required.
Authentication
All SDK methods require authentication using an API key. You can provide your API key in two ways:
Environment Variable (Recommended)
Set the GRAPHOR_API_KEY environment variable:
export GRAPHOR_API_KEY = "grlm_your_api_key_here"
Then initialize the client without any arguments:
from graphor import Graphor
client = Graphor()
Direct Initialization
Pass the API key directly to the client:
from graphor import Graphor
client = Graphor( api_key = "grlm_your_api_key_here" )
Never hardcode API keys in your source code. Use environment variables or a secrets manager.
Upload a File
Upload a local file to your Graphor project for processing.
Method Signature
client.sources.upload(
file : FileTypes, # Required
partition_method: PartitionMethod | None = None , # Optional
timeout: float | None = None
) -> PublicSource
Parameters
Parameter Type Description Required fileFileTypesThe file to upload. Accepts bytes, Path, or tuple (filename, contents, media_type) ✅ Yes partition_methodPartitionMethodProcessing method to use (see Partition Methods below) No timeoutfloatRequest timeout in seconds (default: 60) No
Partition Methods
When provided, the partition_method parameter allows you to process/parse the document immediately during upload. If not provided, the system uses the default method.
Value Name Description "basic"Fast Fast processing with heuristic classification. No OCR. "hi_res"Balanced OCR-based extraction with AI-powered structure classification. "hi_res_ft"Accurate Fine-tuned AI model for highest accuracy (Premium). "mai"VLM Best text-first parsing for manuscripts and handwritten documents. "graphorlm"Agentic Highest parsing setting for complex layouts, multi-page tables, and diagrams.
For more details about processing methods, see the Parse Source documentation.
File Requirements
Graphor supports a wide range of document formats: Documents : PDF, DOC, DOCX, TXT, TEXT, MD, HTML, HTMPresentations : PPT, PPTXSpreadsheets : CSV, TSV, XLS, XLSXImages : PNG, JPG, JPEG, TIFF, BMP, HEICAudio : MP3, WAV, M4A, OGG, FLACVideo : MP4, MOV, AVI, MKV, WEBM
Maximum file size : 100MB per fileFor larger files, consider:
Compressing the file if possible
Splitting large documents into smaller sections
Using file optimization tools before upload
Response Object
The method returns a PublicSource object with the following properties:
Property Type Description statusstrProcessing status (New, Processing, Completed, Failed) messagestrHuman-readable success message file_idstr | NoneUnique identifier for the source (use this for subsequent API calls) file_namestrName of the uploaded file file_sizeintSize of the file in bytes file_typestrFile extension/type file_sourcestrSource type (local file) project_idstrUUID of the target project project_namestrName of the target project partition_methodstr | NoneDocument processing method used
Code Examples
Upload from File Path
from pathlib import Path
from graphor import Graphor
client = Graphor()
# Upload using a Path object
source = client.sources.upload(
file = Path( "./document.pdf" )
)
print ( f "Uploaded: { source.file_name } " )
print ( f "File ID: { source.file_id } " )
print ( f "Status: { source.status } " )
print ( f "Project ID: { source.project_id } " )
Upload with Partition Method
from pathlib import Path
from graphor import Graphor
client = Graphor()
# Upload and process with a specific partition method
source = client.sources.upload(
file = Path( "./document.pdf" ),
partition_method = "hi_res" # Use Balanced processing
)
print ( f "Uploaded: { source.file_name } " )
print ( f "File ID: { source.file_id } " )
print ( f "Partition Method: { source.partition_method } " )
Upload from Bytes
from graphor import Graphor
client = Graphor()
# Read file content as bytes
with open ( "document.pdf" , "rb" ) as f:
file_content = f.read()
# Upload raw bytes with filename tuple
source = client.sources.upload(
file = ( "document.pdf" , file_content, "application/pdf" )
)
print ( f "Uploaded: { source.file_name } " )
Async Upload
import asyncio
from pathlib import Path
from graphor import AsyncGraphor
async def upload_document ():
client = AsyncGraphor()
source = await client.sources.upload(
file = Path( "./document.pdf" )
)
print ( f "Uploaded: { source.file_name } " )
print ( f "Status: { source.status } " )
return source
# Run the async function
asyncio.run(upload_document())
Batch Upload
from pathlib import Path
from graphor import Graphor
client = Graphor()
def batch_upload ( directory : str ):
"""Upload all supported documents from a directory."""
supported_extensions = { '.pdf' , '.doc' , '.docx' , '.txt' , '.md' , '.html' }
uploaded_files = []
failed_files = []
for file_path in Path(directory).iterdir():
if file_path.suffix.lower() in supported_extensions:
try :
source = client.sources.upload( file = file_path)
uploaded_files.append(source.file_name)
print ( f "✅ Uploaded: { file_path.name } " )
except Exception as e:
failed_files.append((file_path.name, str (e)))
print ( f "❌ Failed: { file_path.name } - { e } " )
print ( f " \n Summary: { len (uploaded_files) } uploaded, { len (failed_files) } failed" )
return uploaded_files, failed_files
# Usage
uploaded, failed = batch_upload( "./documents/" )
Error Handling
import graphor
from graphor import Graphor
client = Graphor()
try :
source = client.sources.upload(
file = Path( "./document.pdf" )
)
print ( f "Upload successful: { source.file_name } " )
except graphor.BadRequestError as e:
print ( f "Invalid file type or request: { e } " )
except graphor.AuthenticationError as e:
print ( f "Invalid API key: { e } " )
except graphor.RateLimitError as e:
print ( f "Rate limit exceeded. Please wait and retry: { e } " )
except graphor.APIConnectionError as e:
print ( f "Connection error: { e } " )
except graphor.APIStatusError as e:
print ( f "API error (status { e.status_code } ): { e } " )
Upload Web Page
Ingest content by scraping a public web page.
Method Signature
client.sources.upload_url(
url: str , # Required
crawl_urls: bool = False ,
partition_method: PartitionMethod | None = None , # Optional
timeout: float | None = None
) -> PublicSource
Parameters
Parameter Type Description Required urlstrThe URL of the web page to scrape ✅ Yes crawl_urlsboolWhether to crawl and ingest links from the given URL (default: False) No partition_methodPartitionMethodProcessing method to use (see Partition Methods above) No timeoutfloatRequest timeout in seconds No
URL Requirements
Public web pages
Pages that render primary content server-side and are reachable without interaction
The URL must be publicly reachable over HTTPS
Authentication-protected pages are not supported
Code Examples
Basic Web Page Upload
from graphor import Graphor
client = Graphor()
# Scrape a single web page
source = client.sources.upload_url(
url = "https://example.com/article"
)
print ( f "Ingested: { source.file_name } " )
print ( f "Status: { source.status } " )
Upload with Web Page Crawling
from graphor import Graphor
client = Graphor()
# Scrape a page and follow links
source = client.sources.upload_url(
url = "https://example.com/documentation" ,
crawl_urls = True
)
print ( f "Ingested: { source.file_name } " )
Upload with Partition Method
from graphor import Graphor
client = Graphor()
# Scrape a web page with a specific processing method
source = client.sources.upload_url(
url = "https://example.com/article" ,
partition_method = "hi_res" # Use Balanced processing
)
print ( f "Ingested: { source.file_name } " )
print ( f "File ID: { source.file_id } " )
print ( f "Partition Method: { source.partition_method } " )
Async Web Page Upload
import asyncio
from graphor import AsyncGraphor
async def ingest_webpage ( url : str ):
client = AsyncGraphor()
source = await client.sources.upload_url(
url = url,
crawl_urls = False
)
print ( f "Ingested: { source.file_name } " )
return source
asyncio.run(ingest_webpage( "https://example.com" ))
Error Handling
import graphor
from graphor import Graphor
client = Graphor()
try :
source = client.sources.upload_url(
url = "https://example.com/article"
)
print ( f "URL ingested: { source.file_name } " )
except graphor.BadRequestError as e:
print ( f "Invalid URL: { e } " )
except graphor.APIStatusError as e:
print ( f "Failed to process URL (status { e.status_code } ): { e } " )
Upload from GitHub
Ingest content from a public GitHub repository.
Method Signature
client.sources.upload_github(
url: str , # Required
timeout: float | None = None
) -> PublicSource
Parameters
Parameter Type Description Required urlstrThe GitHub repository URL (e.g., https://github.com/org/repo) ✅ Yes timeoutfloatRequest timeout in seconds No
Repository Requirements
Public GitHub repositories
HTTPS URLs (https://github.com/...)
Only public repositories are supported
Private repository ingestion is not supported
Code Examples
Basic GitHub Upload
from graphor import Graphor
client = Graphor()
# Ingest a public GitHub repository
source = client.sources.upload_github(
url = "https://github.com/organization/repository"
)
print ( f "Ingested: { source.file_name } " )
print ( f "Status: { source.status } " )
print ( f "Project: { source.project_name } " )
Async GitHub Upload
import asyncio
from graphor import AsyncGraphor
async def ingest_github_repo ( repo_url : str ):
client = AsyncGraphor()
source = await client.sources.upload_github( url = repo_url)
print ( f "Repository ingested: { source.file_name } " )
return source
asyncio.run(ingest_github_repo( "https://github.com/org/repo" ))
Error Handling
import graphor
from graphor import Graphor
client = Graphor()
try :
source = client.sources.upload_github(
url = "https://github.com/org/repo"
)
print ( f "GitHub repo ingested: { source.file_name } " )
except graphor.BadRequestError as e:
print ( f "Invalid GitHub URL: { e } " )
except graphor.NotFoundError as e:
print ( f "Repository not found or not accessible: { e } " )
except graphor.APIStatusError as e:
print ( f "Failed to process repository (status { e.status_code } ): { e } " )
Upload from YouTube
Ingest content from a public YouTube video.
Method Signature
client.sources.upload_youtube(
url: str , # Required
timeout: float | None = None
) -> PublicSource
Parameters
Parameter Type Description Required urlstrThe YouTube video URL (e.g., https://www.youtube.com/watch?v=...) ✅ Yes timeoutfloatRequest timeout in seconds No
Video Requirements
Public YouTube video URLs (HTTPS)
Standard watch URLs (https://www.youtube.com/watch?v=VIDEO_ID)
The video must be publicly accessible
Private or access-restricted videos are not supported
Code Examples
Basic YouTube Upload
from graphor import Graphor
client = Graphor()
# Ingest a YouTube video
source = client.sources.upload_youtube(
url = "https://www.youtube.com/watch?v=VIDEO_ID"
)
print ( f "Ingested: { source.file_name } " )
print ( f "Status: { source.status } " )
Async YouTube Upload
import asyncio
from graphor import AsyncGraphor
async def ingest_youtube_video ( video_url : str ):
client = AsyncGraphor()
source = await client.sources.upload_youtube( url = video_url)
print ( f "Video ingested: { source.file_name } " )
return source
asyncio.run(ingest_youtube_video( "https://www.youtube.com/watch?v=VIDEO_ID" ))
Error Handling
import graphor
from graphor import Graphor
client = Graphor()
try :
source = client.sources.upload_youtube(
url = "https://www.youtube.com/watch?v=VIDEO_ID"
)
print ( f "YouTube video ingested: { source.file_name } " )
except graphor.BadRequestError as e:
print ( f "Invalid YouTube URL: { e } " )
except graphor.NotFoundError as e:
print ( f "Video not found or not accessible: { e } " )
except graphor.APIStatusError as e:
print ( f "Failed to process video (status { e.status_code } ): { e } " )
Advanced Configuration
Custom Timeout
For large files or slow connections, you can increase the timeout:
from graphor import Graphor
client = Graphor(
timeout = 300.0 # 5 minutes
)
# Or per-request
source = client.with_options( timeout = 300.0 ).sources.upload(
file = Path( "./large-document.pdf" )
)
Retry Configuration
Configure automatic retries for transient errors:
from graphor import Graphor
client = Graphor(
max_retries = 5 # Default is 2
)
# Or per-request
source = client.with_options( max_retries = 5 ).sources.upload(
file = Path( "./document.pdf" )
)
Accessing Raw Response
Access headers and other response metadata:
from graphor import Graphor
client = Graphor()
response = client.sources.with_raw_response.upload(
file = Path( "./document.pdf" )
)
print ( f "Headers: { response.headers } " )
source = response.parse() # Get the PublicSource object
print ( f "Uploaded: { source.file_name } " )
Using aiohttp for Better Concurrency
For high-concurrency async operations:
import asyncio
from graphor import AsyncGraphor, DefaultAioHttpClient
async def upload_many_files ( file_paths : list ):
async with AsyncGraphor(
http_client = DefaultAioHttpClient()
) as client:
tasks = [
client.sources.upload( file = path)
for path in file_paths
]
results = await asyncio.gather( * tasks, return_exceptions = True )
return results
# Install aiohttp first: pip install graphor[aiohttp]
Error Reference
Error Type Status Code Description BadRequestError400 Invalid file type, missing filename, or malformed request AuthenticationError401 Invalid or missing API key PermissionDeniedError403 Access denied to the specified project NotFoundError404 Project or source not found RateLimitError429 Too many requests, please retry after waiting InternalServerError≥500 Server-side processing error APIConnectionErrorN/A Network connectivity issues APITimeoutErrorN/A Request timed out
Next Steps
After successfully uploading your documents: