Overview
The Data Extraction API allows you to extract structured information from your documents using custom schemas and natural language instructions. The extraction uses the active parsing version of the specified document.Endpoint
Authentication
Include your API token in the Authorization header:Request
Headers
| Header | Value | Required |
|---|---|---|
Authorization | Bearer YOUR_API_TOKEN | Yes |
Content-Type | application/json | Yes |
Body Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file_name | string | Yes | The name of the file to extract from |
user_instruction | string | Yes | Natural language instructions to guide the extraction |
output_schema_fields | array | Yes | List of field definitions for the extraction schema |
Schema Field Object
Each field inoutput_schema_fields has the following properties:
| Property | Type | Required | Description |
|---|---|---|---|
key | string | Yes | Field name in the output (use snake_case) |
type | string | Yes | Data type: string, number, date, boolean, object, or array |
description | string | Yes | Description of what to extract |
nested_fields | array | No | Required for object type or array with items_type: "object". Defines nested field structure |
items_type | string | No | Required for array type. Type of array items: string, number, date, boolean, or object |
Example Request
Example Request with Object and Array Types
Response
Success Response (200 OK)
| Field | Type | Description |
|---|---|---|
file_name | string | The file name of the source |
build_id | string | The parsing version used for extraction |
extracted_items | array | List of extracted items with page references |
Extracted Item Object
Each item inextracted_items contains:
| Field | Type | Description |
|---|---|---|
output | object | Extracted data matching your schema fields |
page_numbers | array | List of page numbers where the data was found |
Example Response
Multiple Items Response
When a document contains multiple extractable entities:Response with Object and Array Types
When usingobject and array types in your schema:
Error Responses
| Status Code | Description |
|---|---|
| 400 | Bad Request - Invalid parameters or schema |
| 401 | Unauthorized - Invalid or missing API token |
| 404 | Not Found - File not found or no parsing history |
| 500 | Internal Server Error |
Schema Examples
Invoice Extraction
Contract Analysis
Resume Parsing
Product Catalog
Usage Examples
Python
Python with Object and Array Types
JavaScript
JavaScript with Object and Array Types
Best Practices
- Be specific in descriptions — Detailed field descriptions improve extraction accuracy
- Use appropriate types — Match field types to expected data (number for amounts, date for dates)
- Provide clear instructions — Guide the extraction with format preferences and edge cases
- Handle multiple items — Design your schema for documents that may contain multiple entities
- Check page references — Use
page_numbersto verify extraction accuracy - Use objects for structured data — Group related fields using
objecttype (e.g., address with street, city, zip) - Use arrays for lists — Extract repeating items using
arraytype with appropriateitems_type - Keep nesting shallow — Avoid deeply nested structures for better extraction accuracy
- Use primitive arrays when possible — For simple lists (tags, categories), use
items_type: "string"instead of objects

