Overview
The Data Extraction API allows you to extract structured information from your documents using standard JSON Schema and natural language instructions. The extraction uses the active parsing version of the specified document.
Endpoint
POST https://sources.graphorlm.com/run-extraction
Authentication
Include your API token in the Authorization header:
Authorization: Bearer YOUR_API_TOKEN
Request
Header Value Required AuthorizationBearer YOUR_API_TOKENYes Content-Typeapplication/jsonYes
Body Parameters
Parameter Type Required Description file_idsstring[] No* List of file IDs to extract from (preferred) file_namesstring[] No* List of file names to extract from (deprecated, use file_ids) user_instructionstring Yes Natural language instructions to guide the extraction output_schemaobject Yes JSON Schema defining the structure of the extracted data thinking_levelstring No Controls model and thinking configuration. Values: "fast", "balanced", "accurate" (default). See Thinking Level for details.
*At least one of file_ids or file_names must be provided. file_ids is preferred.
Output Schema
The output_schema parameter accepts a standard JSON Schema object. This is the same format used by the Chat API for structured outputs.
Thinking Level
The thinking_level parameter controls the model and thinking configuration used for extraction:
Value Description "fast"Uses a faster model without extended thinking. Best for simple extractions where speed is prioritized. "balanced"Uses a more capable model with low thinking. Good balance between quality and speed. "accurate"Default. Uses a more capable model with high thinking. Best for complex extractions requiring deep reasoning.
Example Request (using file_ids)
curl -X POST "https://sources.graphorlm.com/run-extraction" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_ids": ["file_abc123"],
"user_instruction": "Extract all invoice information. Use YYYY-MM-DD format for dates.",
"output_schema": {
"type": "object",
"properties": {
"invoice_number": {
"type": "string",
"description": "The unique invoice identifier"
},
"invoice_date": {
"type": "string",
"description": "Invoice date in YYYY-MM-DD format"
},
"total_amount": {
"type": "number",
"description": "Total amount due"
},
"vendor_name": {
"type": "string",
"description": "Name of the company issuing the invoice"
}
},
"required": ["invoice_number", "total_amount"]
}
}'
Example Request (using file_names - deprecated)
curl -X POST "https://sources.graphorlm.com/run-extraction" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_names": ["invoice-2024.pdf"],
"user_instruction": "Extract all invoice information. Use YYYY-MM-DD format for dates.",
"output_schema": {
"type": "object",
"properties": {
"invoice_number": {
"type": "string",
"description": "The unique invoice identifier"
},
"invoice_date": {
"type": "string",
"description": "Invoice date in YYYY-MM-DD format"
},
"total_amount": {
"type": "number",
"description": "Total amount due"
},
"vendor_name": {
"type": "string",
"description": "Name of the company issuing the invoice"
}
},
"required": ["invoice_number", "total_amount"]
}
}'
Example Request with Thinking Level
curl -X POST "https://sources.graphorlm.com/run-extraction" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_names": ["complex-contract.pdf"],
"user_instruction": "Extract all legal clauses with their implications.",
"thinking_level": "accurate",
"output_schema": {
"type": "object",
"properties": {
"clauses": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"content": { "type": "string" },
"implications": { "type": "string" }
}
}
}
}
}
}'
Example Request with Nested Objects and Arrays
curl -X POST "https://sources.graphorlm.com/run-extraction" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_names": ["invoice-2024.pdf"],
"user_instruction": "Extract invoice with line items and address details.",
"output_schema": {
"type": "object",
"properties": {
"invoice_number": {
"type": "string",
"description": "The unique invoice identifier"
},
"billing_address": {
"type": "object",
"description": "Billing address details",
"properties": {
"street": { "type": "string", "description": "Street address" },
"city": { "type": "string", "description": "City name" },
"zip_code": { "type": "string", "description": "Postal code" },
"country": { "type": "string", "description": "Country name" }
}
},
"tags": {
"type": "array",
"description": "Invoice tags or categories",
"items": { "type": "string" }
},
"line_items": {
"type": "array",
"description": "Invoice line items",
"items": {
"type": "object",
"properties": {
"description": { "type": "string", "description": "Item description" },
"quantity": { "type": "number", "description": "Item quantity" },
"unit_price": { "type": "number", "description": "Price per unit" },
"total": { "type": "number", "description": "Line item total" }
}
}
}
},
"required": ["invoice_number"]
}
}'
Response
Success Response (200 OK)
Field Type Description file_idsarray List of file IDs used for extraction file_namesarray List of file names used for extraction structured_outputobject Extracted data matching your schema raw_jsonstring Raw JSON-text produced by the model before validation/correction
Example Response
{
"file_ids" : [ "file_abc123" ],
"file_names" : [ "invoice-2024.pdf" ],
"structured_output" : {
"invoice_number" : "INV-2024-001" ,
"invoice_date" : "2024-01-15" ,
"total_amount" : 1250.00 ,
"vendor_name" : "Acme Corporation"
},
"raw_json" : "{ \" invoice_number \" : \" INV-2024-001 \" , \" invoice_date \" : \" 2024-01-15 \" , \" total_amount \" : 1250.00, \" vendor_name \" : \" Acme Corporation \" }"
}
Response with Object and Array Types
When using object and array types in your schema:
{
"file_names" : [ "invoice-2024.pdf" ],
"structured_output" : {
"invoice_number" : "INV-2024-001" ,
"billing_address" : {
"street" : "123 Main Street" ,
"city" : "New York" ,
"zip_code" : "10001" ,
"country" : "USA"
},
"tags" : [ "urgent" , "corporate" , "Q1-2024" ],
"line_items" : [
{
"description" : "Professional Services" ,
"quantity" : 10 ,
"unit_price" : 100.00 ,
"total" : 1000.00
},
{
"description" : "Software License" ,
"quantity" : 1 ,
"unit_price" : 250.00 ,
"total" : 250.00
}
]
},
"raw_json" : "{...}"
}
### Error Responses
| Status Code | Description |
|-------------|-------------|
| 400 | Bad Request - Invalid parameters or schema |
| 401 | Unauthorized - Invalid or missing API token |
| 404 | Not Found - File not found or no parsing history |
| 500 | Internal Server Error |
## Schema Examples
### Invoice Extraction
```json
{
"file_names" : [ "invoice.pdf" ],
"user_instruction" : "Extract all invoice details. Convert amounts to numbers without currency symbols." ,
"output_schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : { "type" : "string" , "description" : "Unique invoice identifier" },
"invoice_date" : { "type" : "string" , "description" : "Invoice date (YYYY-MM-DD)" },
"due_date" : { "type" : "string" , "description" : "Payment due date (YYYY-MM-DD)" },
"vendor_name" : { "type" : "string" , "description" : "Company issuing the invoice" },
"customer_name" : { "type" : "string" , "description" : "Customer being billed" },
"subtotal" : { "type" : "number" , "description" : "Subtotal before tax" },
"tax_amount" : { "type" : "number" , "description" : "Tax amount" },
"total_amount" : { "type" : "number" , "description" : "Total amount due" }
},
"required" : [ "invoice_number" , "total_amount" ]
}
}
Contract Analysis
{
"file_names" : [ "contract.pdf" ],
"user_instruction" : "Extract key contract terms with all parties and obligations." ,
"output_schema" : {
"type" : "object" ,
"properties" : {
"contract_title" : { "type" : "string" , "description" : "Title or name of the contract" },
"effective_date" : { "type" : "string" , "description" : "When the contract becomes effective" },
"termination_date" : { "type" : "string" , "description" : "When the contract ends" },
"auto_renewal" : { "type" : "boolean" , "description" : "Whether contract auto-renews" },
"parties" : {
"type" : "array" ,
"description" : "All parties involved in the contract" ,
"items" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" , "description" : "Party name" },
"role" : { "type" : "string" , "description" : "Role (e.g., Licensor, Licensee)" },
"address" : { "type" : "string" , "description" : "Party address" }
}
}
},
"key_terms" : {
"type" : "object" ,
"description" : "Key contract terms and conditions" ,
"properties" : {
"payment_terms" : { "type" : "string" , "description" : "Payment conditions" },
"liability_cap" : { "type" : "number" , "description" : "Maximum liability amount" },
"notice_period_days" : { "type" : "number" , "description" : "Notice period in days" }
}
}
},
"required" : [ "contract_title" , "parties" ]
}
}
Resume Parsing
{
"file_names" : [ "resume.pdf" ],
"user_instruction" : "Extract complete candidate information including work history and education." ,
"output_schema" : {
"type" : "object" ,
"properties" : {
"full_name" : { "type" : "string" , "description" : "Candidate's full name" },
"email" : { "type" : "string" , "description" : "Email address" },
"phone" : { "type" : "string" , "description" : "Phone number" },
"years_experience" : { "type" : "number" , "description" : "Total years of experience" },
"skills" : {
"type" : "array" ,
"description" : "List of technical and soft skills" ,
"items" : { "type" : "string" }
},
"work_experience" : {
"type" : "array" ,
"description" : "Work history" ,
"items" : {
"type" : "object" ,
"properties" : {
"company" : { "type" : "string" , "description" : "Company name" },
"title" : { "type" : "string" , "description" : "Job title" },
"start_date" : { "type" : "string" , "description" : "Start date" },
"end_date" : { "type" : "string" , "description" : "End date (or current)" }
}
}
},
"education" : {
"type" : "array" ,
"description" : "Educational background" ,
"items" : {
"type" : "object" ,
"properties" : {
"institution" : { "type" : "string" , "description" : "School or university name" },
"degree" : { "type" : "string" , "description" : "Degree obtained" },
"graduation_year" : { "type" : "number" , "description" : "Year of graduation" }
}
}
}
},
"required" : [ "full_name" ]
}
}
Product Catalog
{
"file_names" : [ "catalog.pdf" ],
"user_instruction" : "Extract all products with their specifications and variants." ,
"output_schema" : {
"type" : "object" ,
"properties" : {
"product_name" : { "type" : "string" , "description" : "Product name" },
"sku" : { "type" : "string" , "description" : "Product SKU" },
"base_price" : { "type" : "number" , "description" : "Base price" },
"in_stock" : { "type" : "boolean" , "description" : "Whether product is in stock" },
"specifications" : {
"type" : "object" ,
"description" : "Product specifications" ,
"properties" : {
"weight" : { "type" : "number" , "description" : "Weight in kg" },
"dimensions" : { "type" : "string" , "description" : "Dimensions (LxWxH)" },
"material" : { "type" : "string" , "description" : "Main material" }
}
},
"categories" : {
"type" : "array" ,
"description" : "Product categories" ,
"items" : { "type" : "string" }
},
"variants" : {
"type" : "array" ,
"description" : "Product variants" ,
"items" : {
"type" : "object" ,
"properties" : {
"color" : { "type" : "string" , "description" : "Variant color" },
"size" : { "type" : "string" , "description" : "Variant size" },
"price_modifier" : { "type" : "number" , "description" : "Price adjustment" }
}
}
}
},
"required" : [ "product_name" , "sku" ]
}
}
Usage Examples
Python
import requests
url = "https://sources.graphorlm.com/run-extraction"
headers = {
"Authorization" : "Bearer YOUR_API_TOKEN" ,
"Content-Type" : "application/json"
}
# Basic extraction
payload = {
"file_names" : [ "invoice.pdf" ],
"user_instruction" : "Extract invoice information. Use YYYY-MM-DD for dates." ,
"output_schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : { "type" : "string" , "description" : "Invoice ID" },
"total_amount" : { "type" : "number" , "description" : "Total due" },
"invoice_date" : { "type" : "string" , "description" : "Invoice date" }
},
"required" : [ "invoice_number" , "total_amount" ]
}
}
response = requests.post(url, headers = headers, json = payload)
data = response.json()
output = data[ "structured_output" ]
print ( f "Invoice: { output[ 'invoice_number' ] } " )
print ( f "Amount: $ { output[ 'total_amount' ] } " )
print ( f "Date: { output[ 'invoice_date' ] } " )
Python with Nested Objects and Arrays
import requests
url = "https://sources.graphorlm.com/run-extraction"
headers = {
"Authorization" : "Bearer YOUR_API_TOKEN" ,
"Content-Type" : "application/json"
}
# Extraction with nested objects and arrays
payload = {
"file_names" : [ "invoice.pdf" ],
"user_instruction" : "Extract invoice with line items and address." ,
"output_schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : { "type" : "string" , "description" : "Invoice ID" },
"billing_address" : {
"type" : "object" ,
"description" : "Billing address" ,
"properties" : {
"street" : { "type" : "string" , "description" : "Street" },
"city" : { "type" : "string" , "description" : "City" },
"country" : { "type" : "string" , "description" : "Country" }
}
},
"line_items" : {
"type" : "array" ,
"description" : "Invoice line items" ,
"items" : {
"type" : "object" ,
"properties" : {
"description" : { "type" : "string" , "description" : "Item description" },
"quantity" : { "type" : "number" , "description" : "Quantity" },
"price" : { "type" : "number" , "description" : "Unit price" }
}
}
}
},
"required" : [ "invoice_number" ]
}
}
response = requests.post(url, headers = headers, json = payload)
data = response.json()
output = data[ "structured_output" ]
print ( f "Invoice: { output[ 'invoice_number' ] } " )
print ( f "City: { output[ 'billing_address' ][ 'city' ] } " )
print ( "Line Items:" )
for line in output[ "line_items" ]:
print ( f " - { line[ 'description' ] } : { line[ 'quantity' ] } x $ { line[ 'price' ] } " )
JavaScript
const API_URL = "https://sources.graphorlm.com/run-extraction" ;
const API_TOKEN = "YOUR_API_TOKEN" ;
async function extractData ( fileNames , instruction , schema ) {
const response = await fetch ( API_URL , {
method: "POST" ,
headers: {
"Authorization" : `Bearer ${ API_TOKEN } ` ,
"Content-Type" : "application/json"
},
body: JSON . stringify ({
file_names: fileNames ,
user_instruction: instruction ,
output_schema: schema
})
});
return response . json ();
}
// Basic usage
const result = await extractData (
[ "invoice.pdf" ],
"Extract invoice details" ,
{
type: "object" ,
properties: {
invoice_number: { type: "string" , description: "Invoice ID" },
total_amount: { type: "number" , description: "Total due" }
},
required: [ "invoice_number" , "total_amount" ]
}
);
const { structured_output } = result ;
console . log ( `Invoice: ${ structured_output . invoice_number } ` );
console . log ( `Amount: $ ${ structured_output . total_amount } ` );
JavaScript with Nested Objects and Arrays
const API_URL = "https://sources.graphorlm.com/run-extraction" ;
const API_TOKEN = "YOUR_API_TOKEN" ;
// Extraction with nested objects and arrays
const schema = {
type: "object" ,
properties: {
invoice_number: { type: "string" , description: "Invoice ID" },
billing_address: {
type: "object" ,
description: "Billing address" ,
properties: {
street: { type: "string" , description: "Street" },
city: { type: "string" , description: "City" },
country: { type: "string" , description: "Country" }
}
},
tags: {
type: "array" ,
description: "Invoice tags" ,
items: { type: "string" }
},
line_items: {
type: "array" ,
description: "Invoice line items" ,
items: {
type: "object" ,
properties: {
description: { type: "string" , description: "Item description" },
quantity: { type: "number" , description: "Quantity" },
price: { type: "number" , description: "Unit price" }
}
}
}
},
required: [ "invoice_number" ]
};
const response = await fetch ( API_URL , {
method: "POST" ,
headers: {
"Authorization" : `Bearer ${ API_TOKEN } ` ,
"Content-Type" : "application/json"
},
body: JSON . stringify ({
file_names: [ "invoice.pdf" ],
user_instruction: "Extract invoice with all details" ,
output_schema: schema
})
});
const data = await response . json ();
const { invoice_number , billing_address , tags , line_items } = data . structured_output ;
console . log ( `Invoice: ${ invoice_number } ` );
console . log ( `Address: ${ billing_address . street } , ${ billing_address . city } ` );
console . log ( `Tags: ${ tags . join ( ", " ) } ` );
console . log ( "Line Items:" );
line_items . forEach ( line => {
console . log ( ` - ${ line . description } : ${ line . quantity } x $ ${ line . price } ` );
});
Best Practices
Use standard JSON Schema — The API accepts any valid JSON Schema, giving you full flexibility
Be specific in descriptions — Detailed property descriptions improve extraction accuracy
Use appropriate types — Match property types to expected data (number for amounts, string for dates)
Provide clear instructions — Guide the extraction with format preferences and edge cases
Use objects for structured data — Group related fields using nested objects (e.g., address with street, city, zip)
Use arrays for lists — Extract repeating items using arrays with appropriate item schemas
Keep nesting shallow — Avoid deeply nested structures for better extraction accuracy
Define required fields — Use the required array to specify mandatory properties
Use raw_json for debugging — The raw_json field contains the model’s raw output before validation
Data Extraction Guide Learn schema design and extraction best practices
Data Ingestion Improve parsing quality for better extraction results