What are Sources?
Sources in GraphorLM represent documents that serve as the foundation of your knowledge base. These can include:- Local files: PDFs, Word documents, text files, images, spreadsheets, presentations
- Web content: URLs, web pages, online articles
- Code repositories: GitHub repositories and documentation
- Media content: Audio and video files for transcription and analysis
API Endpoints Overview
The Sources API consists of seven main endpoints that provide complete document lifecycle management:Upload Source
POST
https://sources.graphorlm.com/upload
Upload documents from your local system to GraphorLM for processingUpload Source from URL
POST
https://sources.graphorlm.com/upload-url-source
Import documents by providing a publicly accessible URLUpload Source from GitHub
POST
https://sources.graphorlm.com/upload-github-source
Ingest content directly from a GitHub repositoryProcess Source
POST
https://sources.graphorlm.com/process
Reprocess existing documents with different AI models and parsing methodsList Sources
GET
https://sources.graphorlm.com
Retrieve information about all documents in your projectList Source Elements
POST
https://sources.graphorlm.com/elements
Retrieve detailed elements and partitions from processed documentsDelete Source
DELETE
https://sources.graphorlm.com/delete
Permanently remove documents from your projectDocument Processing Pipeline
Understanding how GraphorLM processes your documents helps you make the most of the Sources API:1. Upload Stage
When you upload a document using the Upload Source endpoint:- File is validated for type and size (max 100MB)
- Document is securely stored in your project
- Initial metadata is extracted (filename, size, type)
- Processing begins automatically with the default method
2. Processing Methods
GraphorLM offers multiple processing methods, selectable via the Process Source endpoint:Basic Method
Basic Method
Speed: ⚡⚡⚡ Accuracy: ⭐⭐
- Fastest processing option
- Heuristic-based text extraction
- No OCR processing
- Ideal for plain text and simple documents
OCR Method
OCR Method
Speed: ⚡⚡ Accuracy: ⭐⭐⭐
- Optical Character Recognition for scanned documents
- Heuristic-based structure classification
- Perfect for images and scanned PDFs
- Balances speed and accuracy
Hi-Res Method
Hi-Res Method
Speed: ⚡ Accuracy: ⭐⭐⭐⭐
- AI-powered document structure recognition
- Advanced table and figure detection
- Superior layout analysis
- Recommended for complex documents
Hi-Res (fine-tuned) Method
Hi-Res (fine-tuned) Method
Speed: ⚡ Accuracy: ⭐⭐⭐⭐⭐
- Premium fine-tuned AI models
- Highest accuracy for specialized documents
- Advanced structure recognition
- Best-in-class text extraction
GraphorLM (Beta) Method
GraphorLM (Beta) Method
Speed: ⚡ Accuracy: ⭐⭐⭐⭐
- Advanced graph-based RAG partitioning method
- Utilizes knowledge graph structures for content organization
- Optimized for complex document relationships and semantic understanding
MAI Method
MAI Method
Speed: ⚡⚡⚡ Accuracy: ⭐⭐⭐⭐⭐
- Model-assisted, text-focused partitioning
- No bounding boxes or page layout metadata (no bbox)
- Ideal when you only need clean text and element types
- Performs page annotation (page-level labels and context)
- Performs document annotation (document-level labels and summaries)
- Performs image annotation when images are present in the document
- Best-in-class text parsing; element classification quality is limited
- Recommended for multi-source RAG due to page and document annotations
3. Document Status Lifecycle
Documents progress through various states that you can monitor using the List Sources endpoint:Status | Description | Next Steps |
---|---|---|
New | Document uploaded, awaiting processing | Processing will begin automatically |
Processing | AI models are analyzing the document | Wait for completion |
Completed | Document ready for use in RAG pipelines | Can be used in flows |
Failed | Processing encountered an error | Try different processing method |
Authentication
All Sources API endpoints require authentication using API tokens:Learn how to create and manage API tokens in the API Tokens guide.
Common Workflows
Basic Document Upload Workflow
- Upload: Use Upload Source to add your document
- Monitor: Check status with List Sources
- Optimize: Reprocess with Process Source if needed
- Use: Document is ready for your RAG workflows
Quality Optimization Workflow
- Start with Basic method for speed
- Review extraction quality
- Upgrade to Hi-Res or Hi-Res (fine-tuned) if needed
- Use best results in your application
Document Lifecycle Management
Supported File Types
The Sources API supports a wide range of document formats:Documents & Text
- PDF: Portable Document Format files
- Microsoft Office: DOC, DOCX, PPT, PPTX, XLS, XLSX
- OpenDocument: ODT (Text documents)
- Text Files: TXT, TEXT, MD (Markdown), HTML, HTM
- Data Files: CSV, TSV (Comma/Tab-separated values)
Images & Media
- Images: PNG, JPG, JPEG, TIFF, BMP, HEIC
- Audio: MP3, WAV, M4A, OGG, FLAC
- Video: MP4, MOV, AVI, MKV, WEBM
Processing Recommendations
File Type | Recommended Method | Notes |
---|---|---|
Clean PDFs | Basic, OCR, or MAI | Fast processing for digital PDFs; use MAI for text-only without layout |
Scanned PDFs | OCR or Hi-Res | OCR needed for text extraction |
Complex Documents | Hi-Res or Hi-Res (fine-tuned) | Better structure recognition; consider GraphorLM (Beta) for semantic relationships |
Images with Text | OCR or Hi-Res | Requires OCR for text extraction |
Spreadsheets | Basic or Hi-Res | Hi-Res better for complex tables |
Presentations | Hi-Res or Hi-Res (fine-tuned) | Better slide layout recognition |
Multi-source RAG | MAI | Page and document annotations help unify heterogeneous sources |
Rate Limits and Best Practices
Rate Limits
- Upload: No strict limits, but large files may take longer
- Processing: Allow adequate time for complex methods
- List/Delete: Standard API rate limits apply
Best Practices
Upload Optimization
Upload Optimization
File Preparation:
- Compress large files when possible (max 100MB)
- Use descriptive filenames for easy identification
- Ensure files are not corrupted before upload
- Implement retry logic for network issues
- Validate file types client-side
- Monitor upload progress for large files
Processing Strategy
Processing Strategy
Method Selection:
- Start with Basic for testing and simple documents
- Use OCR for scanned documents and images
- Choose Hi-Res for complex layouts and tables
- Reserve Hi-Res (fine-tuned) for premium accuracy needs
- Prefer MAI when you don’t need bounding boxes/layout metadata and want speed with top text parsing
- Use GraphorLM (Beta) for enhanced semantic relationships and graph-based organization
- For multi-source RAG across heterogeneous sources, prefer MAI because it provides page and document annotations
- Review processing results in GraphorLM dashboard
- Test different methods for optimal results
- Monitor processing times and resource usage
Management & Monitoring
Management & Monitoring
Regular Monitoring:
- Check document status regularly
- Monitor failed processing attempts
- Review processing quality periodically
- Remove outdated documents to save storage
- Reprocess with better methods as they become available
- Keep track of which methods work best for your document types
Error Handling
All Sources API endpoints use consistent error responses:Common Error Codes
Status Code | Meaning | Common Causes |
---|---|---|
400 | Bad Request | Invalid file type, missing parameters, malformed request |
401 | Unauthorized | Invalid or missing API token |
403 | Forbidden | Insufficient permissions for the project |
404 | Not Found | File or project not found |
413 | Payload Too Large | File exceeds 100MB limit |
500 | Internal Server Error | Processing failure or server issues |
Error Response Format
Retry Strategy
Integration Examples
Complete Document Management System
Document Quality Assessment Pipeline
Next Steps
Now that you understand the Sources API, explore these related topics:Data Ingestion Guide
Learn best practices for document processing and optimization
API Tokens
Set up authentication for accessing the Sources API
Chunking Guide
Optimize document segmentation after processing for better RAG performance
Flows API
Build RAG pipelines using your processed documents
Support and Resources
Getting Help
Getting Help
- Documentation: Complete API reference and guides
- Email Support: lucas@graphorlm.com
- Dashboard: Monitor and manage documents at app.graphorlm.com
Code Examples
Code Examples
- GitHub: Sample integrations and SDKs (coming soon)
- Documentation: Code examples in multiple languages
- Community: Share integration patterns and best practices
Monitoring & Analytics
Monitoring & Analytics
- Processing Status: Track document processing in real-time
- Quality Metrics: Evaluate extraction and processing quality
- Usage Analytics: Monitor API usage and performance