List Source Elements

The load_elements method allows you to retrieve detailed information about document elements (partitions) from processed sources in your Graphor project. This method provides access to individual text blocks, images, tables, and other document components with their metadata, positioning, and content, enabling you to analyze document structure and extract specific information programmatically.

Method Overview

Sync Method

client.sources.load_elements()

Async Method

await client.sources.load_elements()

Method Signature

client.sources.load_elements(
    file_id: str | None = None,        # Preferred
    file_name: str | None = None,      # Deprecated
    page: int | None = None,
    page_size: int | None = None,
    filter: Filter | None = None,
    timeout: float | None = None
) -> SourceLoadElementsResponse

Parameters

Parameter	Type	Description	Required
`file_id`	`str`	Unique identifier for the source (preferred)	No*
`file_name`	`str`	Name of the source file to retrieve elements from (deprecated, use `file_id`)	No*
`page`	`int`	Page number for pagination (starts from 1)	No
`page_size`	`int`	Number of elements to return per page	No
`filter`	`Filter`	Filter criteria to refine element selection	No
`timeout`	`float`	Request timeout in seconds	No

*At least one of file_id or file_name must be provided. file_id is preferred.

Filter Parameters

The filter parameter accepts a TypedDict with the following optional fields:

Parameter	Type	Description
`type`	`str`	Filter by specific element type (e.g., `"Title"`, `"NarrativeText"`, `"Table"`)
`page_numbers`	`list[int]`	Filter elements from specific page numbers
`elements_to_remove`	`list[str]`	Exclude specific element types from results

Response Object

The method returns a SourceLoadElementsResponse object:

Property	Type	Description
`items`	`list[Item]`	List of document elements in the current page
`total`	`int`	Total number of elements matching the filter
`page`	`int \| None`	Current page number
`page_size`	`int \| None`	Number of elements per page
`total_pages`	`int \| None`	Total number of pages available

Item Object

Each item in the items list has the following properties:

Property	Type	Description
`id`	`str \| None`	Element identifier (may be None)
`page_content`	`str`	Text content of the element
`type`	`Literal["Document"] \| None`	Always “Document” for this method
`metadata`	`dict \| None`	Rich metadata about the element

Metadata Fields

The metadata dictionary contains detailed information:

Field	Type	Description
`coordinates`	`dict`	Pixel coordinates and layout information
`filename`	`str`	Original filename of the source document
`languages`	`list[str]`	Detected languages in the element
`last_modified`	`str`	ISO timestamp of last modification
`page_number`	`int`	Page number where element appears
`filetype`	`str`	MIME type of the source file
`text_as_html`	`str`	HTML representation of the element
`element_type`	`str`	Type classification of the element
`element_id`	`str`	Unique identifier for the element
`position`	`int`	Sequential position within the document
`bounding_box`	`dict`	Rectangular bounds of the element
`page_layout`	`dict`	Overall page dimensions

Element Types

Type	Description
`Title`	Document and section titles
`NarrativeText`	Main body paragraphs and content
`ListItem`	Items in bullet points or numbered lists
`Table`	Complete data tables
`TableRow`	Individual rows within tables
`Image`	Picture or graphic elements
`Header`	Header content at top of pages
`Footer`	Footer content at bottom of pages
`Formula`	Mathematical formulas and equations
`CompositeElement`	Elements containing multiple types
`FigureCaption`	Text describing images or figures
`PageBreak`	Indicators of page separation
`Address`	Physical address information
`EmailAddress`	Email contact information
`PageNumber`	Page numbering elements
`CodeSnippet`	Programming code segments
`FormKeysValues`	Key-value pairs in forms
`Link`	Hyperlinks and references
`UncategorizedText`	Text that doesn’t fit other categories

Code Examples

Basic Usage

from graphor import Graphor

client = Graphor()

# Get elements from a document
response = client.sources.load_elements(
    file_name="document.pdf",
    page=1,
    page_size=20
)

print(f"Found {response.total} elements (page {response.page}/{response.total_pages})")

for item in response.items:
    element_type = item.metadata.get("element_type") if item.metadata else "Unknown"
    print(f"{element_type}: {item.page_content[:50]}...")

Filter by Element Type

from graphor import Graphor

client = Graphor()

# Get only titles
response = client.sources.load_elements(
    file_name="document.pdf",
    page_size=50,
    filter={"type": "Title"}
)

print(f"Found {response.total} titles")

for item in response.items:
    page_num = item.metadata.get("page_number") if item.metadata else "?"
    print(f"Page {page_num}: {item.page_content}")

Filter by Page Numbers

from graphor import Graphor

client = Graphor()

# Get elements from specific pages
response = client.sources.load_elements(
    file_name="document.pdf",
    page_size=100,
    filter={"page_numbers": [1, 2, 3]}
)

print(f"Found {response.total} elements on pages 1-3")

for item in response.items:
    print(f"Page {item.metadata['page_number']}: {item.page_content[:80]}...")

Exclude Element Types

from graphor import Graphor

client = Graphor()

# Get all elements except footers and page numbers
response = client.sources.load_elements(
    file_name="document.pdf",
    page_size=50,
    filter={"elements_to_remove": ["Footer", "PageNumber"]}
)

print(f"Found {response.total} content elements (excluding footers/page numbers)")

Combine Filters

from graphor import Graphor

client = Graphor()

# Get tables from pages 2-5, excluding certain elements
response = client.sources.load_elements(
    file_name="document.pdf",
    page_size=50,
    filter={
        "type": "Table",
        "page_numbers": [2, 3, 4, 5]
    }
)

print(f"Found {response.total} tables on pages 2-5")

for item in response.items:
    print(f"Table on page {item.metadata['page_number']}:")
    print(f"  {item.page_content[:100]}...")

Async Usage

import asyncio
from graphor import AsyncGraphor

async def get_document_elements(file_name: str):
    client = AsyncGraphor()
    
    response = await client.sources.load_elements(
        file_name=file_name,
        page=1,
        page_size=50
    )
    
    print(f"Found {response.total} elements")
    
    for item in response.items:
        print(f"{item.metadata['element_type']}: {item.page_content[:50]}...")
    
    return response

asyncio.run(get_document_elements("document.pdf"))

Paginate Through All Elements

from graphor import Graphor

client = Graphor()

def get_all_elements(file_name: str, page_size: int = 50):
    """Retrieve all elements from a document."""
    all_elements = []
    page = 1
    
    while True:
        response = client.sources.load_elements(
            file_name=file_name,
            page=page,
            page_size=page_size
        )
        
        all_elements.extend(response.items)
        print(f"Retrieved page {page}/{response.total_pages} ({len(all_elements)}/{response.total} elements)")
        
        if page >= response.total_pages:
            break
        page += 1
    
    return all_elements

# Usage
elements = get_all_elements("document.pdf")
print(f"Total elements retrieved: {len(elements)}")

Error Handling

import graphor
from graphor import Graphor

client = Graphor()

try:
    response = client.sources.load_elements(
        file_name="document.pdf",
        page=1,
        page_size=20
    )
    print(f"Found {response.total} elements")
    
except graphor.NotFoundError as e:
    print(f"File not found: {e}")
    
except graphor.BadRequestError as e:
    print(f"Invalid request parameters: {e}")
    
except graphor.AuthenticationError as e:
    print(f"Invalid API key: {e}")
    
except graphor.APIConnectionError as e:
    print(f"Connection error: {e}")
    
except graphor.APIStatusError as e:
    print(f"API error (status {e.status_code}): {e}")

Advanced Examples

Document Structure Analyzer

Analyze the structure of a document:

from graphor import Graphor
from collections import defaultdict

client = Graphor()

def analyze_document_structure(file_name: str):
    """Analyze document structure and element distribution."""
    all_elements = []
    page = 1
    
    # Fetch all elements
    while True:
        response = client.sources.load_elements(
            file_name=file_name,
            page=page,
            page_size=100
        )
        all_elements.extend(response.items)
        
        if page >= response.total_pages:
            break
        page += 1
    
    # Analyze structure
    type_counts = defaultdict(int)
    page_distribution = defaultdict(int)
    total_chars = 0
    languages = set()
    
    for item in all_elements:
        metadata = item.metadata or {}
        
        element_type = metadata.get("element_type", "Unknown")
        type_counts[element_type] += 1
        
        page_num = metadata.get("page_number", 0)
        page_distribution[page_num] += 1
        
        total_chars += len(item.page_content)
        
        for lang in metadata.get("languages", []):
            languages.add(lang)
    
    return {
        "total_elements": len(all_elements),
        "element_types": dict(type_counts),
        "pages": len(page_distribution),
        "elements_per_page": dict(page_distribution),
        "total_characters": total_chars,
        "average_element_length": total_chars / len(all_elements) if all_elements else 0,
        "detected_languages": list(languages)
    }

# Usage
analysis = analyze_document_structure("research_paper.pdf")
print(f"Document Analysis:")
print(f"  Total elements: {analysis['total_elements']}")
print(f"  Pages: {analysis['pages']}")
print(f"  Element types: {analysis['element_types']}")
print(f"  Languages: {analysis['detected_languages']}")

Extract Tables

Extract all tables from a document:

from graphor import Graphor

client = Graphor()

def extract_tables(file_name: str):
    """Extract all tables from a document."""
    tables = []
    page = 1
    
    while True:
        response = client.sources.load_elements(
            file_name=file_name,
            page=page,
            page_size=50,
            filter={"type": "Table"}
        )
        
        for item in response.items:
            metadata = item.metadata or {}
            tables.append({
                "content": item.page_content,
                "page": metadata.get("page_number"),
                "position": metadata.get("position"),
                "html": metadata.get("text_as_html"),
                "bounding_box": metadata.get("bounding_box")
            })
        
        if page >= response.total_pages:
            break
        page += 1
    
    return tables

# Usage
tables = extract_tables("financial_report.pdf")
print(f"Found {len(tables)} tables")

for i, table in enumerate(tables, 1):
    print(f"\nTable {i} (Page {table['page']}):")
    print(f"  {table['content'][:200]}...")

Build Document Outline

Create a document outline from titles:

from graphor import Graphor

client = Graphor()

def build_document_outline(file_name: str):
    """Build a document outline from titles."""
    response = client.sources.load_elements(
        file_name=file_name,
        page_size=500,
        filter={"type": "Title"}
    )
    
    outline = []
    
    for item in response.items:
        metadata = item.metadata or {}
        html = metadata.get("text_as_html", "")
        
        # Detect heading level from HTML
        level = 5  # default
        if "<h1>" in html: level = 1
        elif "<h2>" in html: level = 2
        elif "<h3>" in html: level = 3
        elif "<h4>" in html: level = 4
        
        outline.append({
            "title": item.page_content,
            "page": metadata.get("page_number"),
            "level": level,
            "position": metadata.get("position")
        })
    
    # Sort by position
    outline.sort(key=lambda x: (x["page"] or 0, x["position"] or 0))
    
    return outline

# Usage
outline = build_document_outline("book.pdf")
print("Document Outline:")
for item in outline:
    indent = "  " * (item["level"] - 1)
    print(f"{indent}• {item['title']} (Page {item['page']})")

Search Content in Elements

Search for specific content within document elements:

from graphor import Graphor

client = Graphor()

def search_in_document(file_name: str, query: str):
    """Search for content within document elements."""
    matches = []
    page = 1
    
    while True:
        response = client.sources.load_elements(
            file_name=file_name,
            page=page,
            page_size=100,
            filter={"elements_to_remove": ["Footer", "PageNumber"]}
        )
        
        for item in response.items:
            if query.lower() in item.page_content.lower():
                metadata = item.metadata or {}
                matches.append({
                    "content": item.page_content,
                    "page": metadata.get("page_number"),
                    "type": metadata.get("element_type"),
                    "position": metadata.get("position")
                })
        
        if page >= response.total_pages:
            break
        page += 1
    
    return matches

def highlight_match(text: str, query: str) -> str:
    """Highlight search query in text."""
    import re
    pattern = re.compile(f"({re.escape(query)})", re.IGNORECASE)
    return pattern.sub(r"**\1**", text)

# Usage
query = "machine learning"
matches = search_in_document("research_paper.pdf", query)

print(f"Found {len(matches)} matches for '{query}':")
for i, match in enumerate(matches[:10], 1):
    print(f"\n{i}. Page {match['page']} ({match['type']}):")
    highlighted = highlight_match(match["content"][:200], query)
    print(f"   {highlighted}...")

Async Batch Processing

Process multiple documents concurrently:

import asyncio
from graphor import AsyncGraphor
import graphor

async def get_elements_async(client: AsyncGraphor, file_name: str):
    """Get all elements from a single document."""
    all_elements = []
    page = 1
    
    while True:
        try:
            response = await client.sources.load_elements(
                file_name=file_name,
                page=page,
                page_size=100
            )
            all_elements.extend(response.items)
            
            if page >= response.total_pages:
                break
            page += 1
            
        except graphor.APIStatusError as e:
            print(f"Error processing {file_name}: {e}")
            break
    
    return {"file_name": file_name, "elements": all_elements}

async def batch_get_elements(file_names: list[str], max_concurrent: int = 3):
    """Get elements from multiple documents concurrently."""
    client = AsyncGraphor()
    semaphore = asyncio.Semaphore(max_concurrent)
    
    async def process_with_semaphore(file_name: str):
        async with semaphore:
            print(f"Processing: {file_name}")
            result = await get_elements_async(client, file_name)
            print(f"  Completed: {file_name} ({len(result['elements'])} elements)")
            return result
    
    tasks = [process_with_semaphore(f) for f in file_names]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    
    return [r for r in results if not isinstance(r, Exception)]

# Usage
files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
results = asyncio.run(batch_get_elements(files))

for result in results:
    print(f"{result['file_name']}: {len(result['elements'])} elements")

Document Comparator

Compare element structure between documents:

from graphor import Graphor
from collections import defaultdict

client = Graphor()

def get_document_stats(file_name: str) -> dict:
    """Get statistics for a document."""
    type_counts = defaultdict(int)
    total_chars = 0
    page = 1
    
    while True:
        response = client.sources.load_elements(
            file_name=file_name,
            page=page,
            page_size=100
        )
        
        for item in response.items:
            metadata = item.metadata or {}
            type_counts[metadata.get("element_type", "Unknown")] += 1
            total_chars += len(item.page_content)
        
        if page >= response.total_pages:
            total_elements = response.total
            break
        page += 1
    
    return {
        "file_name": file_name,
        "total_elements": total_elements,
        "total_characters": total_chars,
        "element_types": dict(type_counts)
    }

def compare_documents(file_name_1: str, file_name_2: str):
    """Compare two documents."""
    stats1 = get_document_stats(file_name_1)
    stats2 = get_document_stats(file_name_2)
    
    all_types = set(stats1["element_types"].keys()) | set(stats2["element_types"].keys())
    
    comparison = {
        "documents": [stats1["file_name"], stats2["file_name"]],
        "total_elements": [stats1["total_elements"], stats2["total_elements"]],
        "total_characters": [stats1["total_characters"], stats2["total_characters"]],
        "element_comparison": {}
    }
    
    for element_type in sorted(all_types):
        count1 = stats1["element_types"].get(element_type, 0)
        count2 = stats2["element_types"].get(element_type, 0)
        comparison["element_comparison"][element_type] = [count1, count2]
    
    return comparison

# Usage
comparison = compare_documents("version1.pdf", "version2.pdf")
print(f"Comparing: {comparison['documents'][0]} vs {comparison['documents'][1]}")
print(f"Elements: {comparison['total_elements'][0]} vs {comparison['total_elements'][1]}")
print(f"Characters: {comparison['total_characters'][0]} vs {comparison['total_characters'][1]}")
print("\nElement breakdown:")
for elem_type, counts in comparison["element_comparison"].items():
    print(f"  {elem_type}: {counts[0]} vs {counts[1]}")

Error Reference

Error Type	Status Code	Description
`BadRequestError`	400	Invalid request payload or parameters
`AuthenticationError`	401	Invalid or missing API key
`NotFoundError`	404	Specified file not found in project
`RateLimitError`	429	Too many requests, please retry after waiting
`InternalServerError`	≥500	Server-side error processing request
`APIConnectionError`	N/A	Network connectivity issues
`APITimeoutError`	N/A	Request timed out

Best Practices

Performance Optimization

Use appropriate page sizes: Start with 20-50 elements per page for optimal performance
Filter server-side: Use filter parameters to reduce data transfer
Cache results: Store element data locally for repeated access

# Good: Filter on server
response = client.sources.load_elements(
    file_name="doc.pdf",
    filter={"type": "Title"}  # Filter on server
)

# Less efficient: Filter on client
response = client.sources.load_elements(
    file_name="doc.pdf",
    page_size=500
)
titles = [item for item in response.items if item.metadata.get("element_type") == "Title"]

Data Processing

Element type awareness: Different element types need different processing
Use HTML field: The text_as_html field preserves formatting
Handle None metadata: Always check if metadata exists before accessing

for item in response.items:
    # Safe metadata access
    metadata = item.metadata or {}
    element_type = metadata.get("element_type", "Unknown")
    page_num = metadata.get("page_number", 0)

Memory Management

Stream large documents: Process in chunks rather than loading all at once
Clear processed data: Remove unnecessary fields when not needed

# Process large documents in chunks
page = 1
while True:
    response = client.sources.load_elements(
        file_name="large_doc.pdf",
        page=page,
        page_size=50
    )
    
    # Process this batch
    for item in response.items:
        process_element(item)  # Your processing logic
    
    if page >= response.total_pages:
        break
    page += 1

Troubleshooting

Slow response times

Causes: Large page sizes, complex filters, or server loadSolutions:

Reduce page_size to 25-50 elements
Use specific filters to reduce result set
Implement request timeouts

client = Graphor(timeout=60.0)

Empty results

Causes: File not processed, incorrect file name, or overly restrictive filtersSolutions:

Verify file has been processed successfully with client.sources.list()
Check file name matches exactly (case-sensitive)
Remove or relax filter criteria

Missing expected elements

Causes: Processing method limitations, file format issues, or filter conflictsSolutions:

Try different partition methods using client.sources.parse()
Check if elements are categorized under different types
Remove elements_to_remove filter temporarily

Memory issues with large documents

Causes: Processing too many elements at onceSolutions:

Reduce page_size and process incrementally
Filter out unnecessary element types
Use streaming processing patterns

Next Steps

After successfully retrieving document elements:

Upload Source

Add new documents to analyze their structure and content

Parse Source

Reprocess documents with different methods for better element extraction

List Sources

View all available documents in your project

Delete Source

Remove documents that are no longer needed from your project

Get Started

Data SDK Options

Method Overview

Sync Method

Async Method

Method Signature

Parameters

Filter Parameters

Response Object

Item Object

Metadata Fields

Element Types

Code Examples

Basic Usage

Filter by Element Type

Filter by Page Numbers

Exclude Element Types

Combine Filters

Async Usage

Paginate Through All Elements

Error Handling

Advanced Examples

Document Structure Analyzer

Extract Tables

Build Document Outline

Search Content in Elements

Async Batch Processing

Document Comparator

Error Reference

Best Practices

Performance Optimization

Data Processing

Memory Management

Troubleshooting

Next Steps

Upload Source

Parse Source

List Sources

Delete Source

Get Started

Data SDK Options

​Method Overview

Sync Method

Async Method

​Method Signature

​Parameters

​Filter Parameters

​Response Object

​Item Object

​Metadata Fields

​Element Types

​Code Examples

​Basic Usage

​Filter by Element Type

​Filter by Page Numbers

​Exclude Element Types

​Combine Filters

​Async Usage

​Paginate Through All Elements

​Error Handling

​Advanced Examples

​Document Structure Analyzer

​Extract Tables

​Build Document Outline

​Search Content in Elements

​Async Batch Processing

​Document Comparator

​Error Reference

​Best Practices

​Performance Optimization

​Data Processing

​Memory Management

​Troubleshooting

​Next Steps

Upload Source

Parse Source

List Sources

Delete Source

Method Overview

Method Signature

Parameters

Filter Parameters

Response Object

Item Object

Metadata Fields

Element Types

Code Examples

Basic Usage

Filter by Element Type

Filter by Page Numbers

Exclude Element Types

Combine Filters

Async Usage

Paginate Through All Elements

Error Handling

Advanced Examples

Document Structure Analyzer

Extract Tables

Build Document Outline

Search Content in Elements

Async Batch Processing

Document Comparator

Error Reference

Best Practices

Performance Optimization

Data Processing

Memory Management

Troubleshooting

Next Steps