Learn how to efficiently import and process unstructured documents in GraphorLM
Document Type | Extensions | Features |
---|---|---|
Text Documents | PDF, TXT, TEXT, MD, DOC, DOCX, ODT, HTML, HTM | Full text extraction, structure preservation |
Images | PNG, JPG, JPEG, TIFF, BMP, HEIC | OCR for text extraction, image analysis |
Presentations | PPT, PPTX | Slide extraction, image processing |
Spreadsheets | XLS, XLSX, CSV, TSV | Table parsing, data extraction |
Audio Files | MP3, WAV, M4A, OGG, FLAC | Speech-to-text transcription, audio analysis |
Video Files | MP4, MOV, AVI, MKV, WEBM | Video transcription, visual content extraction |
Web Content | URL | Web scraping, content extraction |
Code Repositories | GitHub URL | Repository content extraction, code analysis |
Video Content | YouTube URL | Video transcription, content extraction |
https://github.com/username/repository
)Poor OCR quality
Table extraction problems
Multi-language document issues
GitHub repository access issues
YouTube video processing issues
Audio/Video processing issues