- Data Ingestion (Sources): Upload, process, list, and manage documents
- Document Chat: Ask questions about your documents with conversational memory
- Data Extraction: Extract structured data using JSON Schema
- Prebuilt RAG: Retrieve relevant document chunks for custom RAG pipelines
GitHub Repository
View the source code, report issues, and contribute to the SDK.
Installation
Install the Graphor SDK from PyPI:Python 3.9 or higher is required.
Data Ingestion (Sources)
The Sources methods cover the full ingestion lifecycle:Upload Source
Import documents from files, URLs, GitHub, and YouTube
Parse Source
Run OCR/parsing methods and reprocess existing sources
List Sources
Retrieve all sources with status and metadata
List Source Elements
Retrieve structured elements/partitions from processed sources
Delete Source
Permanently remove sources from your project
Document Chat
Once your data is ingested, use the Chat method to ask questions:Data Extraction
Extract specific structured data from your documents using schemas:Prebuilt RAG
Build custom RAG pipelines with semantic document retrieval:What “Data Ingestion” includes
- Upload: Create a new source (file / web page / GitHub / YouTube)
- Parse: Choose OCR/parsing method; reprocess when needed
- List: Monitor status and metadata
- Elements: Retrieve structured elements/partitions after processing
- Delete: Remove a source permanently
Authentication
All SDK methods require authentication using API tokens. You can provide your API key in two ways:Environment Variable (Recommended)
Set theGRAPHOR_API_KEY environment variable:
Direct Initialization
Learn how to generate and manage API tokens in the API Tokens guide.
Token Security
- Never expose tokens in client-side code or public repositories
- Use environment variables to store tokens securely
- Rotate tokens regularly for enhanced security
- Use different tokens for different environments (dev/staging/prod)
Async Usage
Simply importAsyncGraphor instead of Graphor and use await with each API call:
Available Methods
Sources
| Method | Description |
|---|---|
client.sources.upload() | Upload a local file |
client.sources.upload_url() | Upload from a web URL |
client.sources.upload_github() | Upload from GitHub |
client.sources.upload_youtube() | Upload from YouTube |
client.sources.parse() | Reprocess a source with different parsing method |
client.sources.list() | List all sources in the project |
client.sources.delete() | Delete a source permanently |
client.sources.load_elements() | Get parsed elements from a source |
Chat & Extraction
| Method | Description |
|---|---|
client.sources.ask() | Ask questions about your documents |
client.sources.extract() | Extract structured data using JSON Schema |
client.sources.retrieve_chunks() | Retrieve relevant chunks for custom RAG |
Complete Workflow Example
Here’s the full “happy path”: upload → parse → list → elements → chat/extract/rag.1. Upload a source
2. Parse (OCR/parsing)
3. Monitor status (List Sources)
4. Retrieve structured elements (after processing)
5. Ask Questions (Chat)
6. Extract Data (Extraction)
7. Retrieve Chunks (Prebuilt RAG)
Integration Patterns
Complete SDK Client Wrapper
Async Integration
Error Handling
The SDK provides typed exceptions for different error scenarios:Error Types
| Status Code | Error Type | Description |
|---|---|---|
| 400 | BadRequestError | Invalid parameters or malformed request |
| 401 | AuthenticationError | Invalid or missing API key |
| 403 | PermissionDeniedError | Access denied to resource |
| 404 | NotFoundError | Resource doesn’t exist |
| 422 | UnprocessableEntityError | Validation error |
| 429 | RateLimitError | Too many requests |
| ≥500 | InternalServerError | Server-side error |
| N/A | APIConnectionError | Network connectivity issues |
| N/A | APITimeoutError | Request timed out |
Configuration
Retries
Certain errors are automatically retried 2 times by default with exponential backoff:Timeouts
By default, requests time out after 1 minute:Using aiohttp for Better Concurrency
For high-concurrency async operations, use the aiohttp client:Rate Limits and Best Practices
Performance Guidelines
- Batch Operations: Process multiple files sequentially or with controlled concurrency
- Async Processing: Use
AsyncGraphorfor concurrent operations - Retry Logic: The SDK handles retries automatically; configure
max_retriesas needed - Timeout Handling: Increase timeouts for large documents or complex processing
Best Practices
Common Use Cases
Document Processing Pipeline
Q&A System
Custom RAG with Your LLM
Support and Resources
Getting Help
Contact Support
Direct support for technical questions and issues
API Tokens Guide
Learn how to generate and manage authentication tokens
Data Ingestion Guide
Best practices for document upload and processing
REST API Reference
Full REST API documentation for advanced use cases
Next Steps
Ready to start building with the Graphor SDK? Choose your path:For Beginners
Upload Sources
Start by uploading documents from files, URLs, GitHub, and YouTube
Chat with Documents
Ask natural language questions about your documents
API Tokens
Set up authentication for API access
For Advanced Users
Data Extraction
Extract structured data using JSON Schema
Prebuilt RAG
Build custom RAG pipelines with semantic search
Parse Source
Master OCR and parsing methods for optimal results
List Elements
Access structured document elements and metadata

