Endpoint Overview
HTTP Method
POST
Endpoint URL
Authentication
This endpoint requires authentication using an API token. You must include your API token as a Bearer token in the Authorization header.Learn how to create and manage API tokens in the API Tokens guide.
Request Format
Headers
Header | Value | Required |
---|---|---|
Authorization | Bearer YOUR_API_TOKEN | ✅ Yes |
Content-Type | application/json | ✅ Yes |
Request Body
The endpoint requires a JSON payload with the following structure:Request Parameters
Parameter | Type | Required | Description |
---|---|---|---|
file_name | string | ✅ Yes | Name of the source file to retrieve elements from |
page | integer | ❌ No | Page number for pagination (starts from 1) |
page_size | integer | ❌ No | Number of elements to return per page |
filter | object | ❌ No | Filter criteria to refine element selection |
Filter Parameters
Parameter | Type | Description |
---|---|---|
type | string | Filter by specific element type (e.g., “Title”, “NarrativeText”, “Table”) |
page_numbers | array[integer] | Filter elements from specific page numbers |
elementsToRemove | array[string] | Exclude specific element types from results |
Response Format
Success Response (200 OK)
The endpoint returns a paginated response containing document elements:Response Fields
Field | Type | Description |
---|---|---|
items | array | Array of document elements in the current page |
total | integer | Total number of elements matching the filter |
page | integer | Current page number |
page_size | integer | Number of elements per page |
total_pages | integer | Total number of pages available |
Element Object Fields
Field | Type | Description |
---|---|---|
id | string|null | Element identifier (may be null) |
page_content | string | Text content of the element |
type | string | Always “Document” for this endpoint |
metadata | object | Rich metadata about the element |
Metadata Fields
Field | Type | Description |
---|---|---|
coordinates | object | Pixel coordinates and layout information |
filename | string | Original filename of the source document |
languages | array[string] | Detected languages in the element |
last_modified | string | ISO timestamp of last modification |
page_number | integer | Page number where element appears |
filetype | string | MIME type of the source file |
text_as_html | string | HTML representation of the element |
element_type | string | Type classification of the element |
element_id | string | Unique identifier for the element |
position | integer | Sequential position within the document |
bounding_box | object | Rectangular bounds of the element |
page_layout | object | Overall page dimensions |
Element Types
Title
Title
Description: Headers, titles, and section headingsCommon Uses: Document structure analysis, content navigation
NarrativeText
NarrativeText
Description: Regular paragraphs and body textCommon Uses: Main content extraction, text analysis
ListItem
ListItem
Description: Bulleted or numbered list itemsCommon Uses: Structured information extraction
Table
Table
Description: Tabular data and structured informationCommon Uses: Data extraction, structured analysis
Image
Image
Description: Images, figures, and visual elementsCommon Uses: Visual content analysis, figure extraction
CodeSnippet
CodeSnippet
Description: Code blocks and technical snippetsCommon Uses: Technical documentation analysis
Footer
Footer
UncategorizedText
UncategorizedText
Description: Text that doesn’t fit other categoriesCommon Uses: Catch-all for miscellaneous content
Code Examples
JavaScript/Node.js
Python
cURL
PHP
Error Responses
Common Error Codes
Status Code | Error Type | Description |
---|---|---|
400 | Bad Request | Invalid request payload or parameters |
401 | Unauthorized | Invalid or missing API token |
404 | Not Found | Specified file not found in project |
500 | Internal Server Error | Server-side error processing request |
Error Response Format
Error Examples
Invalid File Name (404)
Invalid File Name (404)
Invalid API Token (401)
Invalid API Token (401)
Invalid Request (400)
Invalid Request (400)
Server Error (500)
Server Error (500)
Response Analysis
Element Processing and Filtering
Content Extraction
Integration Examples
Document Analyzer
Content Search System
Best Practices
Performance Optimization
- Use appropriate page sizes: Start with 20-50 elements per page for optimal performance
- Implement client-side caching: Cache element data for repeated access patterns
- Filter server-side: Use filter parameters to reduce data transfer and processing
- Batch processing: Process multiple pages efficiently for large documents
Data Processing
- Element type awareness: Different element types require different processing approaches
- Coordinate utilization: Leverage bounding box data for spatial analysis and layout reconstruction
- HTML parsing: Use
text_as_html
field for rich formatting and structure preservation - Language handling: Consider detected languages for multilingual document processing
Memory Management
- Stream large documents: Process large files in chunks rather than loading all elements at once
- Clean unused data: Remove unnecessary metadata fields when not needed
- Monitor response sizes: Be aware of response size when requesting many elements
Troubleshooting
Slow response times
Slow response times
Causes: Large page sizes, complex filters, or server loadSolutions:
- Reduce page_size to 25-50 elements
- Use specific filters to reduce result set
- Implement request timeouts (45+ seconds recommended)
- Consider processing in smaller batches
Empty results
Empty results
Causes: File not processed, incorrect file name, or overly restrictive filtersSolutions:
- Verify file has been processed successfully
- Check file name matches exactly (case-sensitive)
- Remove or relax filter criteria
- Ensure file contains the expected element types
Missing expected elements
Missing expected elements
Causes: Processing method limitations, file format issues, or filter conflictsSolutions:
- Try different partition methods during upload
- Check if elements are categorized under different types
- Remove elementsToRemove filter temporarily
- Verify page_numbers filter includes correct pages
Incorrect coordinates
Incorrect coordinates
Causes: PDF processing variations, DPI differences, or coordinate system misunderstandingSolutions:
- Understand PixelSpace coordinate system
- Check layout_width and layout_height for scaling
- Consider coordinate transformation for display purposes
- Use relative positioning when possible
Memory issues with large documents
Memory issues with large documents
Causes: Processing too many elements at once or inefficient data handlingSolutions:
- Reduce page_size and process incrementally
- Filter out unnecessary element types
- Clear processed data from memory
- Use streaming processing patterns