Sources API Overview
Complete guide to managing documents in GraphorLM via the REST API
The GraphorLM Sources API provides a comprehensive set of endpoints for managing documents in your projects. From uploading files to processing content with advanced AI models, these endpoints enable you to build powerful document ingestion pipelines and RAG applications.
What are Sources?
Sources in GraphorLM represent documents that serve as the foundation of your knowledge base. These can include:
- Local files: PDFs, Word documents, text files, images, spreadsheets, presentations
- Web content: URLs, web pages, online articles
- Code repositories: GitHub repositories and documentation
- Media content: Audio and video files for transcription and analysis
All sources are processed through GraphorLM’s advanced AI pipeline to extract text, recognize structure, and prepare content for retrieval-augmented generation (RAG) workflows.
API Endpoints Overview
The Sources API consists of four main endpoints that provide complete document lifecycle management:
Upload Source
POST https://sources.graphorlm.com/upload
Upload documents from your local system to GraphorLM for processing
Process Source
POST https://sources.graphorlm.com/process
Reprocess existing documents with different AI models and parsing methods
List Sources
GET https://sources.graphorlm.com
Retrieve information about all documents in your project
Delete Source
DELETE https://sources.graphorlm.com/delete
Permanently remove documents from your project
Document Processing Pipeline
Understanding how GraphorLM processes your documents helps you make the most of the Sources API:
1. Upload Stage
When you upload a document using the Upload Source endpoint:
- File is validated for type and size (max 100MB)
- Document is securely stored in your project
- Initial metadata is extracted (filename, size, type)
- Processing begins automatically with the default method
2. Processing Methods
GraphorLM offers multiple processing methods, selectable via the Process Source endpoint:
Basic Method
Basic Method
Speed: ⚡⚡⚡ Accuracy: ⭐⭐
- Fastest processing option
- Heuristic-based text extraction
- No OCR processing
- Ideal for plain text and simple documents
OCR Method
OCR Method
Speed: ⚡⚡ Accuracy: ⭐⭐⭐
- Optical Character Recognition for scanned documents
- Heuristic-based structure classification
- Perfect for images and scanned PDFs
- Balances speed and accuracy
YOLOX Method
YOLOX Method
Speed: ⚡ Accuracy: ⭐⭐⭐⭐
- AI-powered document structure recognition
- Advanced table and figure detection
- Superior layout analysis
- Recommended for complex documents
Advanced Method
Advanced Method
Speed: ⚡ Accuracy: ⭐⭐⭐⭐⭐
- Premium fine-tuned AI models
- Highest accuracy for specialized documents
- Advanced structure recognition
- Best-in-class text extraction
3. Document Status Lifecycle
Documents progress through various states that you can monitor using the List Sources endpoint:
Status | Description | Next Steps |
---|---|---|
New | Document uploaded, awaiting processing | Processing will begin automatically |
Processing | AI models are analyzing the document | Wait for completion |
Completed | Document ready for use in RAG pipelines | Can be used in flows |
Failed | Processing encountered an error | Try different processing method |
Authentication
All Sources API endpoints require authentication using API tokens:
Learn how to create and manage API tokens in the API Tokens guide.
Common Workflows
Basic Document Upload Workflow
- Upload: Use Upload Source to add your document
- Monitor: Check status with List Sources
- Optimize: Reprocess with Process Source if needed
- Use: Document is ready for your RAG workflows
Quality Optimization Workflow
- Start with Basic method for speed
- Review extraction quality
- Upgrade to YOLOX or Advanced if needed
- Use best results in your application
Document Lifecycle Management
Supported File Types
The Sources API supports a wide range of document formats:
Documents & Text
- PDF: Portable Document Format files
- Microsoft Office: DOC, DOCX, PPT, PPTX, XLS, XLSX
- OpenDocument: ODT (Text documents)
- Text Files: TXT, TEXT, MD (Markdown), HTML, HTM
- Data Files: CSV, TSV (Comma/Tab-separated values)
Images & Media
- Images: PNG, JPG, JPEG, TIFF, BMP, HEIC
- Audio: MP3, WAV, M4A, OGG, FLAC
- Video: MP4, MOV, AVI, MKV, WEBM
Processing Recommendations
File Type | Recommended Method | Notes |
---|---|---|
Clean PDFs | Basic or OCR | Fast processing for digital PDFs |
Scanned PDFs | OCR or YOLOX | OCR needed for text extraction |
Complex Documents | YOLOX or Advanced | Better structure recognition |
Images with Text | OCR or YOLOX | Requires OCR for text extraction |
Spreadsheets | Basic or YOLOX | YOLOX better for complex tables |
Presentations | YOLOX or Advanced | Better slide layout recognition |
Rate Limits and Best Practices
Rate Limits
- Upload: No strict limits, but large files may take longer
- Processing: Allow adequate time for complex methods
- List/Delete: Standard API rate limits apply
Best Practices
Upload Optimization
Upload Optimization
File Preparation:
- Compress large files when possible (max 100MB)
- Use descriptive filenames for easy identification
- Ensure files are not corrupted before upload
Error Handling:
- Implement retry logic for network issues
- Validate file types client-side
- Monitor upload progress for large files
Processing Strategy
Processing Strategy
Method Selection:
- Start with Basic for testing and simple documents
- Use OCR for scanned documents and images
- Choose YOLOX for complex layouts and tables
- Reserve Advanced for premium accuracy needs
Quality Monitoring:
- Review processing results in GraphorLM dashboard
- Test different methods for optimal results
- Monitor processing times and costs
Management & Monitoring
Management & Monitoring
Regular Monitoring:
- Check document status regularly
- Monitor failed processing attempts
- Review processing quality periodically
Maintenance:
- Remove outdated documents to save storage
- Reprocess with better methods as they become available
- Keep track of which methods work best for your document types
Error Handling
All Sources API endpoints use consistent error responses:
Common Error Codes
Status Code | Meaning | Common Causes |
---|---|---|
400 | Bad Request | Invalid file type, missing parameters, malformed request |
401 | Unauthorized | Invalid or missing API token |
403 | Forbidden | Insufficient permissions for the project |
404 | Not Found | File or project not found |
413 | Payload Too Large | File exceeds 100MB limit |
500 | Internal Server Error | Processing failure or server issues |
Error Response Format
Retry Strategy
Integration Examples
Complete Document Management System
Document Quality Assessment Pipeline
Next Steps
Now that you understand the Sources API, explore these related topics:
Data Ingestion Guide
Learn best practices for document processing and optimization
API Tokens
Set up authentication for accessing the Sources API
Chunking Guide
Optimize document segmentation after processing for better RAG performance
Flows API
Build RAG pipelines using your processed documents
Support and Resources
Getting Help
Getting Help
- Documentation: Complete API reference and guides
- Email Support: lucas@graphorlm.com
- Dashboard: Monitor and manage documents at app.graphorlm.com
Code Examples
Code Examples
- GitHub: Sample integrations and SDKs (coming soon)
- Documentation: Code examples in multiple languages
- Community: Share integration patterns and best practices
Monitoring & Analytics
Monitoring & Analytics
- Processing Status: Track document processing in real-time
- Quality Metrics: Evaluate extraction and processing quality
- Usage Analytics: Monitor API usage and performance