Overview
The Document Chat API allows you to ask questions about your ingested documents and receive answers grounded in your content. The API supports conversational memory, enabling follow-up questions that maintain context.Endpoint
Authentication
Include your API token in the Authorization header:Request
Headers
| Header | Value | Required |
|---|---|---|
Authorization | Bearer YOUR_API_TOKEN | Yes |
Content-Type | application/json | Yes |
Body Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
question | string | Yes | The question to ask about your documents |
conversation_id | string | No | Conversation identifier to maintain memory context across questions |
reset | boolean | No | When true, starts a new conversation and ignores previous history. Default: false |
file_ids | string[] | No | Restrict search to specific documents by file ID (preferred) |
file_names | string[] | No | Restrict search to specific documents by file name (deprecated, use file_ids) |
output_schema | object (JSON Schema) | No | Optional JSON Schema to request a structured output. When provided, the API returns validated structured data in structured_output and the raw JSON-text candidate in raw_json. |
thinking_level | string | No | Controls model and thinking configuration. Values: "fast", "balanced", "accurate" (default). See Thinking Level for details. |
include_citation_images | boolean | No | When true, populates image_base64 (base64-encoded PNG of the cited page) inside each entry of citations. Default: false. See Citations for guidance. |
include_citation_markup | boolean | No | When true, the answer field keeps the raw structured citation markup [N](file_id|pX|sY|eZ|fNAME) emitted by the agent instead of stripping it down to plain [N] markers. Default: false. The structured markup format is an implementation detail and may change — prefer parsing the citations array. Has no effect when output_schema is set. |
Thinking Level
Thethinking_level parameter controls the model and thinking configuration used for answering questions:
| Value | Description |
|---|---|
"fast" | Uses a faster model without extended thinking. Best for simple questions where speed is prioritized. |
"balanced" | Uses a more capable model with low thinking. Good balance between quality and speed. |
"accurate" | Default. Uses a more capable model with high thinking. Best for complex questions requiring deep reasoning. |
Example Request
Example with Conversation Memory
Example with Specific Documents (using file_ids)
Example with Specific Documents (using file_names - deprecated)
Example with Thinking Level
Example with Citations and Inline Images
Setinclude_citation_images: true to get base64 screenshots of every cited page in the same response. Use sparingly — see Should I use include_citation_images?.
Example with Structured Output (JSON Schema)
When you passoutput_schema, the API will attempt to return a schema-conformant JSON object/array in structured_output.
Notes/constraints:
- Supported schemas must be simplified JSON Schema
- Unions must be only with
null(e.g.["string", "null"]) - Complex constructs like
oneOf/anyOf/allOf/$refare not supported
Response
Success Response (200 OK)
| Field | Type | Description |
|---|---|---|
answer | string | The answer to your question, with inline [N] markers pointing to entries in citations. When output_schema is provided, this is a short status message and the structured data is in structured_output (raw JSON-text in raw_json). |
structured_output | any | Optional structured output validated against the requested output_schema. Present only when output_schema is provided. |
raw_json | string | Optional raw JSON-text produced by the model before validation/correction. Present only when output_schema is provided. |
conversation_id | string | Conversation identifier for follow-up questions |
citations | object[] | Structured citations resolving each [N] marker in answer. See Citations. May be null/empty when the agent did not ground its answer (e.g. small-talk follow-ups). |
usage | object | Token usage breakdown for the request |
elapsed_s | number | Wall-clock time in seconds |
Citations
Each entry incitations corresponds to one [N] marker that appears in the answer text. Use index to map markers to citation entries.
| Field | Type | Description |
|---|---|---|
index | integer | The 1-based citation number that appears as [N] in answer |
file_id | string | Unique identifier of the source file |
file_name | string | Display name of the source file |
page_number | integer | 1-based page number where the cited content appears |
section_number | integer | Optional section number within the page |
element_id | string | Optional element identifier (e.g. specific paragraph or table) |
text_preview | string | Short text excerpt around the cited content |
image_base64 | string | Base64-encoded PNG screenshot of the cited page. Populated only when the request used include_citation_images=true. May be null if the source is not visualizable (e.g. plain text). |
Should I use include_citation_images?
The flag is convenient for quick prototyping or one-off requests where the client wants the answer and the visual previews in a single round-trip.
For real applications — especially when answers commonly cite many pages — prefer include_citation_images=false (the default) and lazy-load the screenshots on demand via the dedicated endpoint described in Get Page Screenshot. Reasons:
- Payload size: each base64 PNG is typically 100-400 KB. Five citations can push the JSON response above a megabyte and slow down clients.
- Latency: rendering screenshots is parallel but still adds seconds to the response — every page render is I/O + image processing. With the default flag off, the answer comes back as soon as the model finishes.
- Cache locality: the screenshot endpoint sets
Cache-Control: public, max-age=3600and is keyed by(file_id, page_number). Lazy-loading lets browsers and CDNs cache the bytes; inlining base64 prevents that.
include_citation_images=true only when you control both ends and know the answer will cite at most 1-2 pages (e.g. a confirmation step in a workflow). Otherwise, ship the answer with structured citations and fetch images on hover/click.
Example Response
Example Response (Structured Output)
Error Responses
| Status Code | Description |
|---|---|
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Invalid or missing API token |
| 404 | Not Found - Specified file not found |
| 422 | Unprocessable Entity - Invalid output_schema or structured output validation failed |
| 500 | Internal Server Error |
Usage Examples
Python
JavaScript
Best Practices
- Use conversation memory — Pass
conversation_idfor follow-up questions to maintain context - Be specific — Clear, specific questions get better answers
- Scope when needed — Use
file_idsto focus on specific documents - Use structured output when integrating — Provide
output_schemato get JSON you can reliably parse in code - Reset when changing topics — Set
reset: truewhen switching to unrelated questions - Lazy-load citation images — Keep
include_citation_images=false(the default) and fetch page screenshots on demand via Get Page Screenshot. Inlining base64 only makes sense for low-citation, low-frequency requests — for typical chat UIs it bloats the payload by hundreds of KB per cited page. - Parse
citations, not the markup — Use the structuredcitationsarray to render references. The inline[N](file_id|pX|...)markup is hidden by default and is an implementation detail that may change.
Related
Get Page Screenshot
Fetch base64 screenshots for citations on demand
Document Chat Guide
Learn best practices for chatting with your documents
Data Ingestion
Improve parsing quality for better chat responses

