What are Sources?
Sources in Graphor represent documents that serve as the foundation of your knowledge base. These can include:- Local files: PDFs, Word documents, text files, images, spreadsheets, presentations
- Web content: URLs, web pages, online articles
- Code repositories: GitHub repositories and documentation
- Media content: Audio and video files for transcription and analysis
API Endpoints Overview
The Sources API consists of seven main endpoints that provide complete document lifecycle management:Upload Source
POST
https://sources.graphorlm.com/uploadUpload documents from your local system to Graphor for processingUpload Source from URL
POST
https://sources.graphorlm.com/upload-url-sourceImport documents by providing a publicly accessible URLUpload Source from GitHub
POST
https://sources.graphorlm.com/upload-github-sourceIngest content directly from a GitHub repositoryProcess Source
POST
https://sources.graphorlm.com/processReprocess existing documents with different AI models and parsing methodsList Sources
GET
https://sources.graphorlm.comRetrieve information about all documents in your projectList Source Elements
POST
https://sources.graphorlm.com/elementsRetrieve detailed elements and partitions from processed documentsDelete Source
DELETE
https://sources.graphorlm.com/deletePermanently remove documents from your projectDocument Processing Pipeline
Understanding how Graphor processes your documents helps you make the most of the Sources API:1. Upload Stage
When you upload a document using the Upload Source endpoint:- File is validated for type and size (max 100MB)
- Document is securely stored in your project
- Initial metadata is extracted (filename, size, type)
- Processing begins automatically with the default method
2. Processing Methods
Graphor offers multiple processing methods, selectable via the Process Source endpoint:Basic Method
Basic Method
Speed: ⚡⚡⚡ Accuracy: ⭐⭐
- Fastest processing option
- Heuristic-based text extraction
- No OCR processing
- Ideal for plain text and simple documents
OCR Method
OCR Method
Speed: ⚡⚡ Accuracy: ⭐⭐⭐
- Optical Character Recognition for scanned documents
- Heuristic-based structure classification
- Perfect for images and scanned PDFs
- Balances speed and accuracy
Balanced Method
Balanced Method
Speed: ⚡ Accuracy: ⭐⭐⭐⭐
- AI-powered document structure recognition
- Advanced table and figure detection
- Superior layout analysis
- Recommended for complex documents
Accurate Method
Accurate Method
Speed: ⚡ Accuracy: ⭐⭐⭐⭐⭐
- Premium fine-tuned AI models
- Highest accuracy for specialized documents
- Advanced structure recognition
- Best-in-class text extraction
Agentic Method
Agentic Method
Speed: ⚡ Accuracy: ⭐⭐⭐⭐⭐
- Our highest parsing setting for complex layouts
- Multi-page tables, diagrams, and images support
- Rich annotations for images and complex elements
- Uses agentic processing for enhanced understanding
VLM Method
VLM Method
Speed: ⚡⚡⚡ Accuracy: ⭐⭐⭐⭐⭐
- Our best text-first parsing with high-quality output
- No bounding boxes or page layout metadata (no bbox)
- Best for manuscripts and handwritten documents
- Performs page annotation (page-level labels and context)
- Performs document annotation (document-level labels and summaries)
- Performs image annotation when images are present in the document
- Best-in-class text parsing; element classification quality is limited
- Recommended for multi-source RAG due to page and document annotations
3. Document Status Lifecycle
Documents progress through various states that you can monitor using the List Sources endpoint:| Status | Description | Next Steps |
|---|---|---|
| New | Document uploaded, awaiting processing | Processing will begin automatically |
| Processing | AI models are analyzing the document | Wait for completion |
| Completed | Document ready for use in RAG pipelines | Can be used in flows |
| Failed | Processing encountered an error | Try different processing method |
Authentication
All Sources API endpoints require authentication using API tokens:Learn how to create and manage API tokens in the API Tokens guide.
Common Workflows
Basic Document Upload Workflow
- Upload: Use Upload Source to add your document
- Monitor: Check status with List Sources
- Optimize: Reprocess with Process Source if needed
- Use: Document is ready for your RAG workflows
Quality Optimization Workflow
- Start with Fast method for speed
- Review extraction quality
- Upgrade to Balanced or Accurate if needed
- Use best results in your application
Document Lifecycle Management
Supported File Types
The Sources API supports a wide range of document formats:Documents & Text
- PDF: Portable Document Format files
- Microsoft Office: DOC, DOCX, PPT, PPTX, XLS, XLSX
- OpenDocument: ODT (Text documents)
- Text Files: TXT, TEXT, MD (Markdown), HTML, HTM
- Data Files: CSV, TSV (Comma/Tab-separated values)
Images & Media
- Images: PNG, JPG, JPEG, TIFF, BMP, HEIC
- Audio: MP3, WAV, M4A, OGG, FLAC
- Video: MP4, MOV, AVI, MKV, WEBM
Processing Recommendations
| File Type | Recommended Method | Notes |
|---|---|---|
| Clean PDFs | Fast or VLM | Fast processing for digital PDFs; use VLM for text-only without layout |
| Scanned PDFs | Balanced | OCR needed for text extraction |
| Complex Documents | Balanced or Accurate | Better structure recognition; consider Agentic for semantic relationships |
| Images with Text | Balanced | Requires OCR for text extraction |
| Spreadsheets | Fast or Balanced | Balanced better for complex tables |
| Presentations | Balanced or Accurate | Better slide layout recognition |
| Multi-source RAG | VLM | Page and document annotations help unify heterogeneous sources |
Rate Limits and Best Practices
Rate Limits
- Upload: No strict limits, but large files may take longer
- Processing: Allow adequate time for complex methods
- List/Delete: Standard API rate limits apply
Best Practices
Upload Optimization
Upload Optimization
File Preparation:
- Compress large files when possible (max 100MB)
- Use descriptive filenames for easy identification
- Ensure files are not corrupted before upload
- Implement retry logic for network issues
- Validate file types client-side
- Monitor upload progress for large files
Processing Strategy
Processing Strategy
Method Selection:
- Start with Fast for testing and simple documents
- Choose Balanced for complex layouts and tables
- Reserve Accurate for premium accuracy needs
- Prefer VLM for manuscripts, handwritten documents, or when you need best text quality without bounding boxes
- Use Agentic for complex layouts, multi-page tables, diagrams, and images with rich annotations
- For multi-source RAG across heterogeneous sources, prefer VLM because it provides page and document annotations
- Review processing results in Graphor dashboard
- Test different methods for optimal results
- Monitor processing times and resource usage
Management & Monitoring
Management & Monitoring
Regular Monitoring:
- Check document status regularly
- Monitor failed processing attempts
- Review processing quality periodically
- Remove outdated documents to save storage
- Reprocess with better methods as they become available
- Keep track of which methods work best for your document types
Error Handling
All Sources API endpoints use consistent error responses:Common Error Codes
| Status Code | Meaning | Common Causes |
|---|---|---|
| 400 | Bad Request | Invalid file type, missing parameters, malformed request |
| 401 | Unauthorized | Invalid or missing API token |
| 403 | Forbidden | Insufficient permissions for the project |
| 404 | Not Found | File or project not found |
| 413 | Payload Too Large | File exceeds 100MB limit |
| 500 | Internal Server Error | Processing failure or server issues |
Error Response Format
Retry Strategy
Integration Examples
Complete Document Management System
Document Quality Assessment Pipeline
Next Steps
Now that you understand the Sources API, explore these related topics:Data Ingestion Guide
Learn best practices for document processing and optimization
API Tokens
Set up authentication for accessing the Sources API
Chunking Guide
Optimize document segmentation after processing for better RAG performance
Flows API
Build RAG pipelines using your processed documents
Support and Resources
Getting Help
Getting Help
- Documentation: Complete API reference and guides
- Email Support: [email protected]
- Dashboard: Monitor and manage documents at app.graphorlm.com
Code Examples
Code Examples
- GitHub: Sample integrations and SDKs (coming soon)
- Documentation: Code examples in multiple languages
- Community: Share integration patterns and best practices
Monitoring & Analytics
Monitoring & Analytics
- Processing Status: Track document processing in real-time
- Quality Metrics: Evaluate extraction and processing quality
- Usage Analytics: Monitor API usage and performance

