The List Elements endpoint allows you to retrieve detailed information about document elements (partitions) from processed sources in your GraphorLM project. This endpoint provides access to individual text blocks, images, tables, and other document components with their metadata, positioning, and content, enabling you to analyze document structure and extract specific information programmatically.

Endpoint Overview

HTTP Method

POST

Authentication

This endpoint requires authentication using an API token. You must include your API token as a Bearer token in the Authorization header.
Learn how to create and manage API tokens in the API Tokens guide.

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKEN✅ Yes
Content-Typeapplication/json✅ Yes

Request Body

The endpoint requires a JSON payload with the following structure:
{
  "file_name": "document.pdf",
  "page": 1,
  "page_size": 10,
  "filter": {
    "type": "Title",
    "page_numbers": [1, 2, 3],
    "elementsToRemove": ["PageNumber", "Footer"]
  }
}

Request Parameters

ParameterTypeRequiredDescription
file_namestring✅ YesName of the source file to retrieve elements from
pageinteger❌ NoPage number for pagination (starts from 1)
page_sizeinteger❌ NoNumber of elements to return per page
filterobject❌ NoFilter criteria to refine element selection

Filter Parameters

ParameterTypeDescription
typestringFilter by specific element type (e.g., “Title”, “NarrativeText”, “Table”)
page_numbersarray[integer]Filter elements from specific page numbers
elementsToRemovearray[string]Exclude specific element types from results

Response Format

Success Response (200 OK)

The endpoint returns a paginated response containing document elements:
{
  "items": [
    {
      "id": null,
      "metadata": {
        "coordinates": {
          "points": [
            [211.488, 148.4256186],
            [211.488, 165.64101860000005],
            [399.89333760000005, 165.64101860000005],
            [399.89333760000005, 148.4256186]
          ],
          "system": "PixelSpace",
          "layout_width": 612.0,
          "layout_height": 792.0
        },
        "file_directory": "/tmp",
        "filename": "attention.pdf",
        "languages": ["eng", "por"],
        "last_modified": "2025-07-28T13:25:26",
        "page_number": 1,
        "filetype": "application/pdf",
        "text_as_html": "<h2>Attention Is All You Need</h2>",
        "id": "ba479967-85bd-43f9-abf9-c3fbfc2775ce",
        "position": 5,
        "element_type": "Title",
        "element_id": "0ee55f099828817da5485796b339aeab",
        "bounding_box": {
          "height": 17.215400000000045,
          "left": 211.488,
          "top": 148.4256186,
          "width": 188.40533760000005
        },
        "page_layout": {
          "width": 612.0,
          "height": 792.0
        }
      },
      "page_content": "Attention Is All You Need",
      "type": "Document"
    }
  ],
  "total": 393,
  "page": 1,
  "page_size": 10,
  "total_pages": 40
}

Response Fields

FieldTypeDescription
itemsarrayArray of document elements in the current page
totalintegerTotal number of elements matching the filter
pageintegerCurrent page number
page_sizeintegerNumber of elements per page
total_pagesintegerTotal number of pages available

Element Object Fields

FieldTypeDescription
idstring|nullElement identifier (may be null)
page_contentstringText content of the element
typestringAlways “Document” for this endpoint
metadataobjectRich metadata about the element

Metadata Fields

FieldTypeDescription
coordinatesobjectPixel coordinates and layout information
filenamestringOriginal filename of the source document
languagesarray[string]Detected languages in the element
last_modifiedstringISO timestamp of last modification
page_numberintegerPage number where element appears
filetypestringMIME type of the source file
text_as_htmlstringHTML representation of the element
element_typestringType classification of the element
element_idstringUnique identifier for the element
positionintegerSequential position within the document
bounding_boxobjectRectangular bounds of the element
page_layoutobjectOverall page dimensions

Element Types

Code Examples

JavaScript/Node.js

const listElements = async (apiToken, fileName, options = {}) => {
  const payload = {
    file_name: fileName,
    page: options.page || 1,
    page_size: options.pageSize || 20,
    filter: options.filter || {}
  };

  const response = await fetch('https://sources.graphorlm.com/elements', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(payload)
  });

  if (response.ok) {
    const data = await response.json();
    console.log(`Found ${data.total} elements (page ${data.page}/${data.total_pages})`);
    return data;
  } else {
    const error = await response.text();
    throw new Error(`Failed to list elements: ${response.status} ${error}`);
  }
};

// Usage - Get all titles from first page
listElements('grlm_your_api_token_here', 'document.pdf', {
  page: 1,
  pageSize: 10,
  filter: { type: 'Title' }
})
.then(response => {
  response.items.forEach(element => {
    console.log(`${element.metadata.element_type}: ${element.page_content}`);
  });
})
.catch(error => console.error('Error:', error));

Python

import requests

def list_elements(api_token, file_name, page=1, page_size=20, filter_options=None):
    url = "https://sources.graphorlm.com/elements"
    
    payload = {
        "file_name": file_name,
        "page": page,
        "page_size": page_size,
        "filter": filter_options or {}
    }
    
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=payload, headers=headers, timeout=30)
    
    if response.status_code == 200:
        data = response.json()
        print(f"Found {data['total']} elements (page {data['page']}/{data['total_pages']})")
        return data
    else:
        response.raise_for_status()

# Usage - Get tables from specific pages
try:
    elements = list_elements(
        "grlm_your_api_token_here",
        "document.pdf",
        page=1,
        page_size=50,
        filter_options={
            "type": "Table",
            "page_numbers": [2, 3, 4]
        }
    )
    
    for element in elements['items']:
        print(f"Page {element['metadata']['page_number']}: {element['page_content'][:100]}...")
        
except requests.exceptions.RequestException as e:
    print(f"Error listing elements: {e}")

cURL

curl -X POST https://sources.graphorlm.com/elements \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "document.pdf",
    "page": 1,
    "page_size": 10,
    "filter": {
      "type": "NarrativeText",
      "page_numbers": [1, 2]
    }
  }'

PHP

<?php
function listElements($apiToken, $fileName, $options = []) {
    $url = "https://sources.graphorlm.com/elements";
    
    $payload = [
        "file_name" => $fileName,
        "page" => $options['page'] ?? 1,
        "page_size" => $options['page_size'] ?? 20,
        "filter" => $options['filter'] ?? (object)[]
    ];
    
    $headers = [
        "Authorization: Bearer " . $apiToken,
        "Content-Type: application/json"
    ];
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload));
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    
    $response = curl_exec($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    
    if ($httpCode === 200) {
        return json_decode($response, true);
    } else {
        throw new Exception("Failed to list elements. HTTP code: " . $httpCode);
    }
}

// Usage - Get all elements excluding footers
try {
    $elements = listElements("grlm_your_api_token_here", "document.pdf", [
        'page' => 1,
        'page_size' => 25,
        'filter' => [
            'elementsToRemove' => ['Footer', 'PageNumber']
        ]
    ]);
    
    echo "Found " . $elements['total'] . " elements\n";
    
    foreach ($elements['items'] as $element) {
        echo $element['metadata']['element_type'] . ": " . 
             substr($element['page_content'], 0, 50) . "...\n";
    }
} catch (Exception $e) {
    echo "Error: " . $e->getMessage() . "\n";
}
?>

Error Responses

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid request payload or parameters
401UnauthorizedInvalid or missing API token
404Not FoundSpecified file not found in project
500Internal Server ErrorServer-side error processing request

Error Response Format

{
  "detail": "File not found"
}

Error Examples

Response Analysis

Element Processing and Filtering

def analyze_elements(elements_response):
    """Analyze element distribution and content."""
    items = elements_response['items']
    
    # Group by element type
    type_counts = {}
    total_text_length = 0
    page_distribution = {}
    
    for element in items:
        # Count by type
        element_type = element['metadata']['element_type']
        type_counts[element_type] = type_counts.get(element_type, 0) + 1
        
        # Calculate text metrics
        total_text_length += len(element['page_content'])
        
        # Track page distribution
        page_num = element['metadata']['page_number']
        page_distribution[page_num] = page_distribution.get(page_num, 0) + 1
    
    return {
        'element_types': type_counts,
        'average_text_length': total_text_length / len(items) if items else 0,
        'page_distribution': page_distribution,
        'total_elements': len(items)
    }

# Usage
elements = list_elements("grlm_your_token", "document.pdf", page_size=100)
analysis = analyze_elements(elements)
print(f"Analysis: {analysis}")

Content Extraction

const extractStructuredContent = (elementsResponse) => {
  const items = elementsResponse.items;
  
  const structured = {
    titles: [],
    paragraphs: [],
    tables: [],
    images: [],
    metadata: {
      totalElements: items.length,
      pages: new Set(),
      languages: new Set()
    }
  };
  
  items.forEach(element => {
    const type = element.metadata.element_type;
    const content = element.page_content;
    
    // Collect metadata
    structured.metadata.pages.add(element.metadata.page_number);
    element.metadata.languages?.forEach(lang => 
      structured.metadata.languages.add(lang)
    );
    
    // Categorize content
    switch (type) {
      case 'Title':
        structured.titles.push({
          text: content,
          page: element.metadata.page_number,
          position: element.metadata.position
        });
        break;
      case 'NarrativeText':
        structured.paragraphs.push({
          text: content,
          page: element.metadata.page_number,
          html: element.metadata.text_as_html
        });
        break;
      case 'Table':
        structured.tables.push({
          content: content,
          page: element.metadata.page_number,
          bbox: element.metadata.bounding_box
        });
        break;
      case 'Image':
        structured.images.push({
          page: element.metadata.page_number,
          bbox: element.metadata.bounding_box,
          description: content
        });
        break;
    }
  });
  
  // Convert Sets to Arrays for JSON serialization
  structured.metadata.pages = Array.from(structured.metadata.pages).sort();
  structured.metadata.languages = Array.from(structured.metadata.languages);
  
  return structured;
};

Integration Examples

Document Analyzer

import requests
from typing import List, Dict, Any

class DocumentAnalyzer:
    def __init__(self, api_token: str):
        self.api_token = api_token
        self.base_url = "https://sources.graphorlm.com"
    
    def get_document_structure(self, file_name: str) -> Dict[str, Any]:
        """Get hierarchical document structure."""
        # Get all titles first
        titles_response = self._list_elements(
            file_name, 
            filter_options={"type": "Title"},
            page_size=100
        )
        
        # Get all content elements
        content_response = self._list_elements(
            file_name,
            filter_options={"elementsToRemove": ["Footer", "PageNumber"]},
            page_size=500
        )
        
        return {
            "document_outline": self._build_outline(titles_response['items']),
            "content_summary": self._summarize_content(content_response['items']),
            "total_elements": content_response['total'],
            "pages": self._get_page_count(content_response['items'])
        }
    
    def extract_tables(self, file_name: str) -> List[Dict]:
        """Extract all tables from document."""
        tables = []
        page = 1
        
        while True:
            response = self._list_elements(
                file_name,
                page=page,
                page_size=50,
                filter_options={"type": "Table"}
            )
            
            tables.extend([
                {
                    "content": item['page_content'],
                    "page": item['metadata']['page_number'],
                    "position": item['metadata']['position'],
                    "html": item['metadata']['text_as_html']
                }
                for item in response['items']
            ])
            
            if page >= response['total_pages']:
                break
            page += 1
        
        return tables
    
    def _list_elements(self, file_name: str, page: int = 1, 
                      page_size: int = 20, filter_options: Dict = None):
        payload = {
            "file_name": file_name,
            "page": page,
            "page_size": page_size,
            "filter": filter_options or {}
        }
        
        headers = {
            "Authorization": f"Bearer {self.api_token}",
            "Content-Type": "application/json"
        }
        
        response = requests.post(
            f"{self.base_url}/elements",
            json=payload,
            headers=headers,
            timeout=30
        )
        response.raise_for_status()
        return response.json()
    
    def _build_outline(self, titles: List[Dict]) -> List[Dict]:
        return [
            {
                "title": item['page_content'],
                "page": item['metadata']['page_number'],
                "level": self._detect_heading_level(item['metadata']['text_as_html'])
            }
            for item in titles
        ]
    
    def _detect_heading_level(self, html: str) -> int:
        if '<h1>' in html: return 1
        elif '<h2>' in html: return 2
        elif '<h3>' in html: return 3
        elif '<h4>' in html: return 4
        else: return 5
    
    def _summarize_content(self, elements: List[Dict]) -> Dict:
        type_counts = {}
        total_chars = 0
        
        for element in elements:
            element_type = element['metadata']['element_type']
            type_counts[element_type] = type_counts.get(element_type, 0) + 1
            total_chars += len(element['page_content'])
        
        return {
            "element_distribution": type_counts,
            "total_characters": total_chars,
            "average_element_length": total_chars / len(elements) if elements else 0
        }
    
    def _get_page_count(self, elements: List[Dict]) -> int:
        return max(element['metadata']['page_number'] for element in elements) if elements else 0

# Usage
analyzer = DocumentAnalyzer("grlm_your_token")
structure = analyzer.get_document_structure("research_paper.pdf")
tables = analyzer.extract_tables("research_paper.pdf")

print(f"Document has {structure['pages']} pages")
print(f"Found {len(tables)} tables")

Content Search System

class ContentSearcher {
  constructor(apiToken) {
    this.apiToken = apiToken;
    this.baseUrl = 'https://sources.graphorlm.com';
  }
  
  async searchInDocument(fileName, query, options = {}) {
    // Get all text elements
    const allElements = await this.getAllElements(fileName, {
      elementsToRemove: ['Image', 'PageNumber', 'Footer']
    });
    
    // Search for query in content
    const matches = allElements.filter(element => 
      element.page_content.toLowerCase().includes(query.toLowerCase())
    );
    
    return {
      query,
      total_matches: matches.length,
      matches: matches.map(match => ({
        content: this.highlightMatch(match.page_content, query),
        page: match.metadata.page_number,
        type: match.metadata.element_type,
        context: this.getContext(match, allElements, options.contextSize || 1)
      }))
    };
  }
  
  async getAllElements(fileName, filter = {}) {
    const allElements = [];
    let page = 1;
    let totalPages = 1;
    
    do {
      const response = await this.listElements(fileName, {
        page,
        pageSize: 100,
        filter
      });
      
      allElements.push(...response.items);
      totalPages = response.total_pages;
      page++;
    } while (page <= totalPages);
    
    return allElements;
  }
  
  async listElements(fileName, options = {}) {
    const payload = {
      file_name: fileName,
      page: options.page || 1,
      page_size: options.pageSize || 20,
      filter: options.filter || {}
    };

    const response = await fetch(`${this.baseUrl}/elements`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiToken}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(payload)
    });

    if (!response.ok) {
      throw new Error(`HTTP ${response.status}: ${await response.text()}`);
    }

    return response.json();
  }
  
  highlightMatch(text, query) {
    const regex = new RegExp(`(${query})`, 'gi');
    return text.replace(regex, '<mark>$1</mark>');
  }
  
  getContext(element, allElements, contextSize) {
    const currentIndex = allElements.findIndex(
      el => el.metadata.element_id === element.metadata.element_id
    );
    
    const start = Math.max(0, currentIndex - contextSize);
    const end = Math.min(allElements.length, currentIndex + contextSize + 1);
    
    return allElements.slice(start, end).map(el => ({
      content: el.page_content,
      type: el.metadata.element_type,
      isCurrent: el.metadata.element_id === element.metadata.element_id
    }));
  }
}

// Usage
const searcher = new ContentSearcher('grlm_your_token');

searcher.searchInDocument('research_paper.pdf', 'machine learning')
  .then(results => {
    console.log(`Found ${results.total_matches} matches for "${results.query}"`);
    results.matches.forEach((match, index) => {
      console.log(`Match ${index + 1} (Page ${match.page}):`);
      console.log(match.content);
      console.log('---');
    });
  })
  .catch(error => console.error('Search error:', error));

Best Practices

Performance Optimization

  • Use appropriate page sizes: Start with 20-50 elements per page for optimal performance
  • Implement client-side caching: Cache element data for repeated access patterns
  • Filter server-side: Use filter parameters to reduce data transfer and processing
  • Batch processing: Process multiple pages efficiently for large documents

Data Processing

  • Element type awareness: Different element types require different processing approaches
  • Coordinate utilization: Leverage bounding box data for spatial analysis and layout reconstruction
  • HTML parsing: Use text_as_html field for rich formatting and structure preservation
  • Language handling: Consider detected languages for multilingual document processing

Memory Management

  • Stream large documents: Process large files in chunks rather than loading all elements at once
  • Clean unused data: Remove unnecessary metadata fields when not needed
  • Monitor response sizes: Be aware of response size when requesting many elements

Troubleshooting

Next Steps

After successfully retrieving document elements: