Documentation Index Fetch the complete documentation index at: https://docs.graphorlm.com/llms.txt
Use this file to discover all available pages before exploring further.
The get_elements method (same name as the API endpoint) returns the parsed elements of a source. Each item is a BuildStatusElement with element_id, element_type, text, markdown, html, optional img_base64, position, page_number, bounding_box, page_layout, and more. Use file_id (from list or get build status ).
Method overview
client.sources.get_elements()
await client.sources.getElements()
Method signature
client.sources.get_elements(
file_id: str , # Required
page: int | None = None ,
page_size: int | None = None ,
suppress_img_base64: bool = False ,
type : str | None = None , # Filter by element type
page_numbers: list[ int ] | None = None ,
elements_to_remove: list[ str ] | None = None ,
timeout: float | None = None
) -> SourceGetElementsResponse
await client . sources . getElements ({
file_id: string , // Required
page? : number | null ,
page_size? : number | null ,
suppress_img_base64? : boolean ,
type? : string | null ,
page_numbers? : number [] | null ,
elementsToRemove? : string [] | null ,
}): Promise < SourceGetElementsResponse >
Parameters
Parameter Type Description Required file_idstrUnique identifier of the source Yes pageint | None1-based page number (use with page_size) No page_sizeint | NoneElements per page (1–100) No suppress_img_base64boolWhen true, omit img_base64 from each element No typestr | NoneFilter by element type (e.g. Title, NarrativeText, Table) No page_numberslist[int] | NoneRestrict to specific page numbers No elements_to_removelist[str] | NoneElement types to exclude No timeoutfloatRequest timeout in seconds No
Parameter Type Description Required file_idstringUnique identifier of the source Yes pagenumber | null1-based page number No page_sizenumber | nullElements per page (1–100) No suppress_img_base64booleanWhen true, omit img_base64 No typestring | nullFilter by element type (e.g. Title, NarrativeText) No page_numbersnumber[] | nullRestrict to specific page numbers No elementsToRemovestring[] | nullElement types to exclude No
Filter parameters
All filter parameters are passed at the top level (not as a nested object).
Parameter Python TypeScript Description Element type filter typetypeFilter by element type (e.g. Title, NarrativeText, Table) Page number filter page_numberspage_numbersRestrict to specific page numbers Exclude types elements_to_removeelementsToRemoveElement types to exclude
Response
Paginated response with BuildStatusElement items (same shape as elements in Get build status ):
Field Type Description itemslistElements in the current page (or all if no pagination) totalintTotal elements matching filters pageint | nullCurrent page (1-based) or null page_sizeint | nullElements per page or null total_pagesint | nullTotal pages or null
BuildStatusElement (each item)
Field Type Description element_idstr | nullUnique identifier for the element element_typestr | nulle.g. Title, NarrativeText, Table, Image textstrPlain text content markdownstr | nullMarkdown when available htmlstr | nullHTML when available img_base64str | nullBase64 image (omitted if suppress_img_base64=true) positionint | nullOrder within the document page_numberint | nullPage number (1-based) bounding_boxobject | nullBounding box (left, top, width, height) page_layoutobject | nullPage dimensions page_annotationstr | nullPage-level annotation page_keywordsarray | nullKeywords for the page page_topicsarray | nullTopics for the page metadataobjectAdditional metadata
Element Types
Type Description TitleDocument and section titles NarrativeTextMain body paragraphs and content ListItemItems in bullet points or numbered lists TableComplete data tables TableRowIndividual rows within tables ImagePicture or graphic elements HeaderHeader content at top of pages FooterFooter content at bottom of pages FormulaMathematical formulas and equations CompositeElementElements containing multiple types FigureCaptionText describing images or figures PageBreakIndicators of page separation AddressPhysical address information EmailAddressEmail contact information PageNumberPage numbering elements CodeSnippetProgramming code segments FormKeysValuesKey-value pairs in forms LinkHyperlinks and references UncategorizedTextText that doesn’t fit other categories
Code Examples
Basic usage
from graphor import Graphor
client = Graphor()
file_id = "file_abc123" # from list() or get_build_status
response = client.sources.get_elements( file_id = file_id, page = 1 , page_size = 20 )
print ( f "Found { response.total } elements (page { response.page } / { response.total_pages } )" )
for item in response.items:
print ( f " { item.element_type } : { item.text[: 50 ] } ..." )
const client = new Graphor ();
const fileId = 'file_abc123' ;
const response = await client . sources . getElements ({
file_id: fileId ,
page: 1 ,
page_size: 20 ,
});
console . log ( `Found ${ response . total } elements (page ${ response . page } / ${ response . total_pages } )` );
for ( const item of response . items ) {
console . log ( ` ${ item . element_type } : ${ item . text . slice ( 0 , 50 ) } ...` );
}
Filter by element type
response = client.sources.get_elements(
file_id = file_id,
page_size = 50 ,
type = "Title"
)
for item in response.items:
print ( f "Page { item.page_number } : { item.text } " )
const response = await client . sources . getElements ({
file_id: fileId ,
page_size: 50 ,
type: 'Title' ,
});
for ( const item of response . items ) {
console . log ( `Page ${ item . page_number } : ${ item . text } ` );
}
Filter by page numbers
response = client.sources.get_elements(
file_id = file_id,
page_size = 100 ,
page_numbers = [ 1 , 2 , 3 ]
)
for item in response.items:
print ( f "Page { item.page_number } : { item.text[: 80 ] } ..." )
const response = await client . sources . getElements ({
file_id: fileId ,
page_size: 100 ,
page_numbers: [ 1 , 2 , 3 ],
});
for ( const item of response . items ) {
console . log ( `Page ${ item . page_number } : ${ item . text . slice ( 0 , 80 ) } ...` );
}
Exclude element types
response = client.sources.get_elements(
file_id = file_id,
page_size = 50 ,
elements_to_remove = [ "Footer" , "PageNumber" ]
)
const response = await client . sources . getElements ({
file_id: fileId ,
page_size: 50 ,
elementsToRemove: [ 'Footer' , 'PageNumber' ],
});
Combine filters
response = client.sources.get_elements(
file_id = file_id,
page_size = 50 ,
type = "Table" ,
page_numbers = [ 2 , 3 , 4 , 5 ]
)
for item in response.items:
print ( f "Table on page { item.page_number } : { item.text[: 100 ] } ..." )
const response = await client . sources . getElements ({
file_id: fileId ,
page_size: 50 ,
type: 'Table' ,
page_numbers: [ 2 , 3 , 4 , 5 ],
});
for ( const item of response . items ) {
console . log ( `Table on page ${ item . page_number } : ${ item . text . slice ( 0 , 100 ) } ...` );
}
Async usage
import asyncio
from graphor import AsyncGraphor
async def get_document_elements ( file_id : str ):
client = AsyncGraphor()
response = await client.sources.get_elements( file_id = file_id, page = 1 , page_size = 50 )
print ( f "Found { response.total } elements" )
for item in response.items:
print ( f " { item.element_type } : { item.text[: 50 ] } ..." )
return response
asyncio.run(get_document_elements( "file_abc123" ))
async function getDocumentElements ( fileId : string ) {
const response = await client . sources . getElements ({
file_id: fileId ,
page: 1 ,
page_size: 50 ,
});
for ( const item of response . items ) {
console . log ( ` ${ item . element_type } : ${ item . text . slice ( 0 , 50 ) } ...` );
}
return response ;
}
await getDocumentElements ( 'file_abc123' );
Paginate through all elements
def get_all_elements ( file_id : str , page_size : int = 50 ):
all_elements = []
page = 1
while True :
response = client.sources.get_elements( file_id = file_id, page = page, page_size = page_size)
all_elements.extend(response.items)
if response.total_pages is None or page >= response.total_pages:
break
page += 1
return all_elements
elements = get_all_elements( "file_abc123" )
async function getAllElements ( fileId : string , page_size = 50 ) {
const all : typeof response . items = [];
let page = 1 ;
while ( true ) {
const response = await client . sources . getElements ({ file_id: fileId , page , page_size });
all . push ( ... response . items );
if ( response . total_pages == null || page >= response . total_pages ) break ;
page ++ ;
}
return all ;
}
const elements = await getAllElements ( 'file_abc123' );
Error handling
try :
response = client.sources.get_elements( file_id = file_id, page = 1 , page_size = 20 )
print ( f "Found { response.total } elements" )
except graphor.NotFoundError as e:
print ( "Source not found:" , e)
except graphor.BadRequestError as e:
print ( "Invalid request (e.g. missing file_id):" , e)
except graphor.APIStatusError as e:
print ( "API error:" , e)
try {
const response = await client . sources . getElements ({ file_id: fileId , page: 1 , page_size: 20 });
console . log ( 'Found' , response . total , 'elements' );
} catch ( err ) {
if ( err instanceof Graphor . NotFoundError ) {
console . log ( 'Source not found:' , err . message );
} else if ( err instanceof Graphor . APIError ) {
console . log ( 'API error:' , err . message );
} else {
throw err ;
}
}
Advanced Examples
Document Structure Analyzer
Analyze the structure of a document:
from graphor import Graphor
from collections import defaultdict
client = Graphor()
def analyze_document_structure ( file_id : str ):
"""Analyze document structure and element distribution."""
all_elements = []
page = 1
# Fetch all elements
while True :
response = client.sources.get_elements(
file_id = file_id,
page = page,
page_size = 100
)
all_elements.extend(response.items)
if page >= response.total_pages:
break
page += 1
# Analyze structure
type_counts = defaultdict( int )
page_distribution = defaultdict( int )
total_chars = 0
languages = set ()
for item in all_elements:
element_type = item.element_type or "Unknown"
type_counts[element_type] += 1
page_num = item.page_number or 0
page_distribution[page_num] += 1
total_chars += len (item.text)
for lang in (item.metadata or {}).get( "languages" , []):
languages.add(lang)
return {
"total_elements" : len (all_elements),
"element_types" : dict (type_counts),
"pages" : len (page_distribution),
"elements_per_page" : dict (page_distribution),
"total_characters" : total_chars,
"average_element_length" : total_chars / len (all_elements) if all_elements else 0 ,
"detected_languages" : list (languages)
}
# Usage
analysis = analyze_document_structure( "file_abc123" )
print ( f "Document Analysis:" )
print ( f " Total elements: { analysis[ 'total_elements' ] } " )
print ( f " Pages: { analysis[ 'pages' ] } " )
print ( f " Element types: { analysis[ 'element_types' ] } " )
print ( f " Languages: { analysis[ 'detected_languages' ] } " )
import Graphor from 'graphor' ;
const client = new Graphor ();
async function analyzeDocumentStructure ( fileName : string ) {
type ElementItem = Awaited <
ReturnType < typeof client . sources . getElements >
>[ 'items' ][ number ];
const allElements : ElementItem [] = [];
let page = 1 ;
// Fetch all elements
while ( true ) {
const response = await client . sources . getElements ({
file_id: fileId ,
page ,
page_size: 100 ,
});
allElements . push ( ... response . items );
if ( page >= ( response . total_pages ?? 1 )) break ;
page ++ ;
}
// Analyze structure
const typeCounts : Record < string , number > = {};
const pageDistribution : Record < number , number > = {};
let totalChars = 0 ;
const languages = new Set < string >();
for ( const item of allElements ) {
const elementType = item . element_type ?? 'Unknown' ;
typeCounts [ elementType ] = ( typeCounts [ elementType ] ?? 0 ) + 1 ;
const pageNum = item . page_number ?? 0 ;
pageDistribution [ pageNum ] = ( pageDistribution [ pageNum ] ?? 0 ) + 1 ;
totalChars += item . text . length ;
for ( const lang of ( item . metadata ?. languages as string []) ?? []) {
languages . add ( lang );
}
}
return {
totalElements: allElements . length ,
elementTypes: typeCounts ,
pages: Object . keys ( pageDistribution ). length ,
elementsPerPage: pageDistribution ,
totalCharacters: totalChars ,
averageElementLength: allElements . length > 0 ? totalChars / allElements . length : 0 ,
detectedLanguages: [ ... languages ],
};
}
// Usage
const analysis = await analyzeDocumentStructure ( 'file_abc123' );
console . log ( 'Document Analysis:' );
console . log ( ` Total elements: ${ analysis . totalElements } ` );
console . log ( ` Pages: ${ analysis . pages } ` );
console . log ( ` Element types:` , analysis . elementTypes );
console . log ( ` Languages:` , analysis . detectedLanguages );
Extract all tables from a document:
from graphor import Graphor
client = Graphor()
def extract_tables ( file_id : str ):
"""Extract all tables from a document."""
tables = []
page = 1
while True :
response = client.sources.get_elements(
file_id = file_id,
page = page,
page_size = 50 ,
type = "Table"
)
for item in response.items:
tables.append({
"content" : item.text,
"page" : item.page_number,
"position" : item.position,
"html" : item.html,
"bounding_box" : item.bounding_box
})
if page >= response.total_pages:
break
page += 1
return tables
# Usage
tables = extract_tables( "file_abc123" )
print ( f "Found { len (tables) } tables" )
for i, table in enumerate (tables, 1 ):
print ( f " \n Table { i } (Page { table[ 'page' ] } ):" )
print ( f " { table[ 'content' ][: 200 ] } ..." )
import Graphor from 'graphor' ;
const client = new Graphor ();
async function extractTables ( fileId : string ) {
const tables : {
content : string ;
page : number | undefined ;
position : number | undefined ;
html : string | undefined ;
boundingBox : unknown ;
}[] = [];
let page = 1 ;
while ( true ) {
const response = await client . sources . getElements ({
file_id: fileId ,
page ,
page_size: 50 ,
type: 'Table' ,
});
for ( const item of response . items ) {
const metadata = item . metadata ?? {};
tables . push ({
content: item . text ,
page: item . page_number ?? undefined ,
position: item . position ?? undefined ,
html: item . html ?? undefined ,
boundingBox: item . bounding_box ,
});
}
if ( page >= ( response . total_pages ?? 1 )) break ;
page ++ ;
}
return tables ;
}
// Usage
const tables = await extractTables ( 'file_abc123' );
console . log ( `Found ${ tables . length } tables` );
tables . forEach (( table , i ) => {
console . log ( ` \n Table ${ i + 1 } (Page ${ table . page } ):` );
console . log ( ` ${ table . content . slice ( 0 , 200 ) } ...` );
});
Build Document Outline
Create a document outline from titles:
from graphor import Graphor
client = Graphor()
def build_document_outline ( file_id : str ):
"""Build a document outline from titles."""
response = client.sources.get_elements(
file_id = file_id,
page_size = 500 ,
type = "Title"
)
outline = []
for item in response.items:
html = item.html or ""
level = 5
if "<h1>" in html: level = 1
elif "<h2>" in html: level = 2
elif "<h3>" in html: level = 3
elif "<h4>" in html: level = 4
outline.append({
"title" : item.text,
"page" : item.page_number,
"level" : level,
"position" : item.position
})
# Sort by position
outline.sort( key = lambda x : (x[ "page" ] or 0 , x[ "position" ] or 0 ))
return outline
# Usage
outline = build_document_outline( "file_abc123" )
print ( "Document Outline:" )
for item in outline:
indent = " " * (item[ "level" ] - 1 )
print ( f " { indent } • { item[ 'title' ] } (Page { item[ 'page' ] } )" )
import Graphor from 'graphor' ;
const client = new Graphor ();
async function buildDocumentOutline ( fileId : string ) {
const response = await client . sources . getElements ({
file_id: fileId ,
page_size: 500 ,
type: 'Title' ,
});
const outline = response . items . map (( item ) => {
const html = item . html ?? '' ;
let level = 5 ;
if ( html . includes ( '<h1>' )) level = 1 ;
else if ( html . includes ( '<h2>' )) level = 2 ;
else if ( html . includes ( '<h3>' )) level = 3 ;
else if ( html . includes ( '<h4>' )) level = 4 ;
return {
title: item . text ,
page: item . page_number ?? undefined ,
level ,
position: item . position ?? undefined ,
};
});
// Sort by position
outline . sort (( a , b ) => ( a . page ?? 0 ) - ( b . page ?? 0 ) || ( a . position ?? 0 ) - ( b . position ?? 0 ));
return outline ;
}
// Usage
const outline = await buildDocumentOutline ( 'file_abc123' );
console . log ( 'Document Outline:' );
for ( const item of outline ) {
const indent = ' ' . repeat ( item . level - 1 );
console . log ( ` ${ indent } • ${ item . title } (Page ${ item . page } )` );
}
Search Content in Elements
Search for specific content within document elements:
from graphor import Graphor
client = Graphor()
def search_in_document ( file_id : str , query : str ):
"""Search for content within document elements."""
matches = []
page = 1
while True :
response = client.sources.get_elements(
file_id = file_id,
page = page,
page_size = 100 ,
elements_to_remove = [ "Footer" , "PageNumber" ]
)
for item in response.items:
if query.lower() in item.text.lower():
matches.append({
"content" : item.text,
"page" : item.page_number,
"type" : item.element_type,
"position" : item.position
})
if page >= response.total_pages:
break
page += 1
return matches
def highlight_match ( text : str , query : str ) -> str :
"""Highlight search query in text."""
import re
pattern = re.compile( f "( { re.escape(query) } )" , re. IGNORECASE )
return pattern.sub( r " ** \1 ** " , text)
# Usage
query = "machine learning"
matches = search_in_document( "file_abc123" , query)
print ( f "Found { len (matches) } matches for ' { query } ':" )
for i, match in enumerate (matches[: 10 ], 1 ):
print ( f " \n { i } . Page { match[ 'page' ] } ( { match[ 'type' ] } ):" )
highlighted = highlight_match(match[ "content" ][: 200 ], query)
print ( f " { highlighted } ..." )
import Graphor from 'graphor' ;
const client = new Graphor ();
async function searchInDocument ( fileId : string , query : string ) {
const matches : {
content : string ;
page : number | undefined ;
type : string | undefined ;
position : number | undefined ;
}[] = [];
let page = 1 ;
while ( true ) {
const response = await client . sources . getElements ({
file_id: fileId ,
page ,
page_size: 100 ,
elementsToRemove: [ 'Footer' , 'PageNumber' ],
});
for ( const item of response . items ) {
if ( item . text . toLowerCase (). includes ( query . toLowerCase ())) {
matches . push ({
content: item . text ,
page: item . page_number ?? undefined ,
type: item . element_type ?? undefined ,
position: item . position ?? undefined ,
});
}
}
if ( page >= ( response . total_pages ?? 1 )) break ;
page ++ ;
}
return matches ;
}
function highlightMatch ( text : string , query : string ) : string {
const regex = new RegExp ( `( ${ query . replace ( / [ .*+?^${}()|[ \]\\ ] / g , ' \\ $&' ) } )` , 'gi' );
return text . replace ( regex , '**$1**' );
}
// Usage
const query = 'machine learning' ;
const matches = await searchInDocument ( 'file_abc123' , query );
console . log ( `Found ${ matches . length } matches for ' ${ query } ':` );
matches . slice ( 0 , 10 ). forEach (( match , i ) => {
console . log ( ` \n ${ i + 1 } . Page ${ match . page } ( ${ match . type } ):` );
const highlighted = highlightMatch ( match . content . slice ( 0 , 200 ), query );
console . log ( ` ${ highlighted } ...` );
});
Async Batch Processing
Process multiple documents concurrently:
import asyncio
from graphor import AsyncGraphor
import graphor
async def get_elements_async ( client : AsyncGraphor, file_id : str ):
"""Get all elements from a single document."""
all_elements = []
page = 1
while True :
try :
response = await client.sources.get_elements(
file_id = file_id,
page = page,
page_size = 100
)
all_elements.extend(response.items)
if page >= response.total_pages:
break
page += 1
except graphor.APIStatusError as e:
print ( f "Error processing { file_id } : { e } " )
break
return { "file_id" : file_id, "elements" : all_elements}
async def batch_get_elements ( file_ids : list[ str ], max_concurrent : int = 3 ):
"""Get elements from multiple documents concurrently."""
client = AsyncGraphor()
semaphore = asyncio.Semaphore(max_concurrent)
async def process_with_semaphore ( fid : str ):
async with semaphore:
print ( f "Processing: { fid } " )
result = await get_elements_async(client, fid)
print ( f " Completed: { fid } ( { len (result[ 'elements' ]) } elements)" )
return result
tasks = [process_with_semaphore(f) for f in file_ids]
results = await asyncio.gather( * tasks, return_exceptions = True )
return [r for r in results if not isinstance (r, Exception )]
# Usage
file_ids = [ "file_1" , "file_2" , "file_3" ]
results = asyncio.run(batch_get_elements(file_ids))
for result in results:
print ( f " { result[ 'file_id' ] } : { len (result[ 'elements' ]) } elements" )
import Graphor from 'graphor' ;
const client = new Graphor ();
type ElementItem = Awaited <
ReturnType < typeof client . sources . getElements >
>[ 'items' ][ number ];
async function getElementsForFile ( fileId : string ) {
const allElements : ElementItem [] = [];
let page = 1 ;
while ( true ) {
try {
const response = await client . sources . getElements ({
file_id: fileId ,
page ,
page_size: 100 ,
});
allElements . push ( ... response . items );
if ( page >= ( response . total_pages ?? 1 )) break ;
page ++ ;
} catch ( err ) {
const message = err instanceof Graphor . APIError ? err . message : String ( err );
console . log ( `Error processing ${ fileId } : ${ message } ` );
break ;
}
}
return { fileId , elements: allElements };
}
async function batchGetElements ( fileIds : string [], maxConcurrent = 3 ) {
const results : Awaited < ReturnType < typeof getElementsForFile >>[] = [];
for ( let i = 0 ; i < fileIds . length ; i += maxConcurrent ) {
const batch = fileIds . slice ( i , i + maxConcurrent );
console . log ( `Processing batch ${ Math . floor ( i / maxConcurrent ) + 1 } ...` );
const batchResults = await Promise . all (
batch . map ( async ( fid ) => {
console . log ( `Processing: ${ fid } ` );
const result = await getElementsForFile ( fid );
console . log ( ` Completed: ${ fid } ( ${ result . elements . length } elements)` );
return result ;
}),
);
results . push ( ... batchResults );
}
return results ;
}
// Usage
const results = await batchGetElements ([ 'file_1' , 'file_2' , 'file_3' ]);
for ( const result of results ) {
console . log ( ` ${ result . fileId } : ${ result . elements . length } elements` );
}
Document Comparator
Compare element structure between documents:
from graphor import Graphor
from collections import defaultdict
client = Graphor()
def get_document_stats ( file_id : str ) -> dict :
"""Get statistics for a document."""
type_counts = defaultdict( int )
total_chars = 0
page = 1
while True :
response = client.sources.get_elements(
file_id = file_id,
page = page,
page_size = 100
)
for item in response.items:
type_counts[item.element_type or "Unknown" ] += 1
total_chars += len (item.text)
if page >= response.total_pages:
total_elements = response.total
break
page += 1
return {
"file_id" : file_id,
"total_elements" : total_elements,
"total_characters" : total_chars,
"element_types" : dict (type_counts)
}
def compare_documents ( file_id_1 : str , file_id_2 : str ):
"""Compare two documents."""
stats1 = get_document_stats(file_id_1)
stats2 = get_document_stats(file_id_2)
all_types = set (stats1[ "element_types" ].keys()) | set (stats2[ "element_types" ].keys())
comparison = {
"documents" : [stats1[ "file_id" ], stats2[ "file_id" ]],
"total_elements" : [stats1[ "total_elements" ], stats2[ "total_elements" ]],
"total_characters" : [stats1[ "total_characters" ], stats2[ "total_characters" ]],
"element_comparison" : {}
}
for element_type in sorted (all_types):
count1 = stats1[ "element_types" ].get(element_type, 0 )
count2 = stats2[ "element_types" ].get(element_type, 0 )
comparison[ "element_comparison" ][element_type] = [count1, count2]
return comparison
# Usage
comparison = compare_documents( "file_1" , "file_2" )
print ( f "Comparing: { comparison[ 'documents' ][ 0 ] } vs { comparison[ 'documents' ][ 1 ] } " )
print ( f "Elements: { comparison[ 'total_elements' ][ 0 ] } vs { comparison[ 'total_elements' ][ 1 ] } " )
print ( f "Characters: { comparison[ 'total_characters' ][ 0 ] } vs { comparison[ 'total_characters' ][ 1 ] } " )
print ( " \n Element breakdown:" )
for elem_type, counts in comparison[ "element_comparison" ].items():
print ( f " { elem_type } : { counts[ 0 ] } vs { counts[ 1 ] } " )
import Graphor from 'graphor' ;
const client = new Graphor ();
async function getDocumentStats ( fileId : string ) {
const typeCounts : Record < string , number > = {};
let totalChars = 0 ;
let totalElements = 0 ;
let page = 1 ;
while ( true ) {
const response = await client . sources . getElements ({
file_id: fileId ,
page ,
page_size: 100 ,
});
for ( const item of response . items ) {
const elementType = item . element_type ?? 'Unknown' ;
typeCounts [ elementType ] = ( typeCounts [ elementType ] ?? 0 ) + 1 ;
totalChars += item . text . length ;
}
if ( page >= ( response . total_pages ?? 1 )) {
totalElements = response . total ;
break ;
}
page ++ ;
}
return { fileId , totalElements , totalCharacters: totalChars , elementTypes: typeCounts };
}
async function compareDocuments ( fileId1 : string , fileId2 : string ) {
const [ stats1 , stats2 ] = await Promise . all ([
getDocumentStats ( fileId1 ),
getDocumentStats ( fileId2 ),
]);
const allTypes = new Set ([
... Object . keys ( stats1 . elementTypes ),
... Object . keys ( stats2 . elementTypes ),
]);
const elementComparison : Record < string , [ number , number ]> = {};
for ( const type of [ ... allTypes ]. sort ()) {
elementComparison [ type ] = [
stats1 . elementTypes [ type ] ?? 0 ,
stats2 . elementTypes [ type ] ?? 0 ,
];
}
return {
documents: [ stats1 . fileId , stats2 . fileId ],
totalElements: [ stats1 . totalElements , stats2 . totalElements ],
totalCharacters: [ stats1 . totalCharacters , stats2 . totalCharacters ],
elementComparison ,
};
}
// Usage
const comparison = await compareDocuments ( 'file_1' , 'file_2' );
console . log ( `Comparing: ${ comparison . documents [ 0 ] } vs ${ comparison . documents [ 1 ] } ` );
console . log ( `Elements: ${ comparison . totalElements [ 0 ] } vs ${ comparison . totalElements [ 1 ] } ` );
console . log ( `Characters: ${ comparison . totalCharacters [ 0 ] } vs ${ comparison . totalCharacters [ 1 ] } ` );
console . log ( ' \n Element breakdown:' );
for ( const [ elemType , counts ] of Object . entries ( comparison . elementComparison )) {
console . log ( ` ${ elemType } : ${ counts [ 0 ] } vs ${ counts [ 1 ] } ` );
}
Error Reference
Error Type Status Code Description BadRequestError400 Invalid request payload or parameters AuthenticationError401 Invalid or missing API key NotFoundError404 Source not found for the given file_id RateLimitError429 Too many requests, please retry after waiting InternalServerError≥500 Server-side error processing request APIConnectionErrorN/A Network connectivity issues APITimeoutErrorN/A Request timed out
Best Practices
Use appropriate page sizes : Start with 20-50 elements per page for optimal performance
Filter server-side : Use filter parameters to reduce data transfer
Cache results : Store element data locally for repeated access
# Good: Filter on server
response = client.sources.get_elements( file_id = file_id, type = "Title" )
# Less efficient: Filter on client
response = client.sources.get_elements( file_id = file_id, page_size = 500 )
titles = [item for item in response.items if item.element_type == "Title" ]
// Good: Filter on server
const response = await client . sources . getElements ({ file_id: fileId , type: 'Title' });
// Less efficient: Filter on client
const all = await client . sources . getElements ({ file_id: fileId , page_size: 500 });
const titles = all . items . filter (( item ) => item . element_type === 'Title' );
Data Processing
Element type awareness : Different element types need different processing
Use HTML field : The text_as_html field preserves formatting
Handle None metadata : Always check if metadata exists before accessing
for item in response.items:
element_type = item.element_type or "Unknown"
page_num = item.page_number or 0
for ( const item of response . items ) {
const elementType = item . element_type ?? 'Unknown' ;
const pageNum = item . page_number ?? 0 ;
}
Memory Management
Stream large documents : Process in chunks rather than loading all at once
Clear processed data : Remove unnecessary fields when not needed
# Process large documents in chunks
page = 1
while True :
response = client.sources.get_elements(
file_id = file_id,
page = page,
page_size = 50
)
# Process this batch
for item in response.items:
process_element(item) # Your processing logic
if page >= response.total_pages:
break
page += 1
// Process large documents in chunks
let page = 1 ;
while ( true ) {
const response = await client . sources . getElements ({
file_id: fileId ,
page ,
page_size: 50 ,
});
// Process this batch
for ( const item of response . items ) {
processElement ( item ); // Your processing logic
}
if ( page >= ( response . total_pages ?? 1 )) break ;
page ++ ;
}
Troubleshooting
Causes : Large page sizes, complex filters, or server loadSolutions :
Reduce page_size to 25-50 elements
Use specific filters to reduce result set
Implement request timeouts
client = Graphor( timeout = 60.0 )
const client = new Graphor ({ timeout: 60 * 1000 });
Causes : File not processed, incorrect file name, or overly restrictive filtersSolutions :
Verify source is processed (status Completed) with client.sources.list()
Use file_id from list or get build status
Remove or relax filter criteria
Missing expected elements
Causes : Processing method limitations, file format issues, or filter conflictsSolutions :
Try a different partition method using client.sources.reprocess()
Check if elements are categorized under different types
Remove elements_to_remove filter temporarily
Memory issues with large documents
Causes : Processing too many elements at onceSolutions :
Reduce page_size and process incrementally
Filter out unnecessary element types
Use streaming processing patterns
Next steps
After retrieving elements:
Get build status Poll build status and get elements for a build
List sources List all sources and their file_ids
Upload Ingest files, URLs, GitHub, or YouTube
Reprocess source Re-process a source with a different partition method
Delete source Remove a source by file_id