Retrieve all chunking nodes from a specific flow in your GraphorLM project. Chunking nodes are responsible for splitting documents into smaller text chunks and generating embeddings, which are essential components in RAG (Retrieval-Augmented Generation) pipelines.

Overview

The List Chunking Nodes endpoint allows you to retrieve information about chunking nodes within a flow. Chunking nodes process document content by dividing it into smaller, manageable pieces and generating vector embeddings for similarity search and retrieval operations.
  • Method: GET
  • URL: https://{flow_name}.flows.graphorlm.com/chunking
  • Authentication: Required (API Token)

Authentication

All requests must include a valid API token in the Authorization header:
Authorization: Bearer YOUR_API_TOKEN
Learn how to generate API tokens in the API Tokens guide.

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKENYes

Parameters

No query parameters are required for this endpoint.

Example Request

GET https://my-rag-pipeline.flows.graphorlm.com/chunking
Authorization: Bearer YOUR_API_TOKEN

Response Format

Success Response (200 OK)

The response contains an array of chunking node objects:
[
  {
    "id": "chunking-1748287628685",
    "type": "chunking",
    "position": {
      "x": 300,
      "y": 200
    },
    "style": {
      "height": 180,
      "width": 280
    },
    "data": {
      "name": "Document Chunking",
      "config": {
        "embeddingModel": "text-embedding-3-small",
        "chunkingSplitter": "character",
        "chunkSize": 1000,
        "chunkOverlap": 200,
        "chunkSeparator": "\n\n",
        "splitLevel": 0,
        "elementsToRemove": ["Header", "Footer"]
      },
      "result": {
        "updated": true,
        "processing": false,
        "waiting": false,
        "has_error": false,
        "total_chunks": 420
      }
    }
  }
]

Response Structure

Each chunking node in the array contains:
FieldTypeDescription
idstringUnique identifier for the chunking node
typestringNode type (always “chunking” for chunking nodes)
positionobjectPosition coordinates in the flow canvas
styleobjectVisual styling properties (height, width)
dataobjectChunking node configuration and results

Position Object

FieldTypeDescription
xnumberX coordinate position in the flow canvas
ynumberY coordinate position in the flow canvas

Style Object

FieldTypeDescription
heightintegerHeight of the node in pixels
widthintegerWidth of the node in pixels

Data Object

FieldTypeDescription
namestringDisplay name of the chunking node
configobjectNode configuration including chunking settings
resultobjectProcessing results and status information (optional)

Config Object

FieldTypeDescription
embeddingModelstringEmbedding model used for generating vectors (e.g., “text-embedding-3-small”, “colqwen”)
chunkingSplitterstringType of splitter: “character”, “token”, “semantic”, “title”, or “element”
chunkSizeintegerSize of each chunk in characters or tokens
chunkOverlapintegerNumber of characters/tokens that overlap between consecutive chunks
chunkSeparatorstringText separator used for splitting (default: “\n\n”)
splitLevelintegerSplit level for hierarchical splitters (default: 0)
elementsToRemovearrayList of document elements to remove during processing

Result Object (Optional)

FieldTypeDescription
updatedbooleanWhether the node has been processed with current configuration
processingbooleanWhether the node is currently being processed
waitingbooleanWhether the node is waiting for dependencies
has_errorbooleanWhether the node encountered an error during processing
total_chunksintegerNumber of chunks generated (if available)

Code Examples

JavaScript/Node.js

async function listChunkingNodes(flowName, apiToken) {
  const response = await fetch(`https://${flowName}.flows.graphorlm.com/chunking`, {
    method: 'GET',
    headers: {
      'Authorization': `Bearer ${apiToken}`
    }
  });

  if (!response.ok) {
    throw new Error(`HTTP error! status: ${response.status}`);
  }

  return await response.json();
}

// Usage
listChunkingNodes('my-rag-pipeline', 'YOUR_API_TOKEN')
  .then(chunkingNodes => {
    console.log(`Found ${chunkingNodes.length} chunking node(s)`);
    
    chunkingNodes.forEach(node => {
      console.log(`\nNode: ${node.data.name} (${node.id})`);
      console.log(`Embedding Model: ${node.data.config.embeddingModel}`);
      console.log(`Splitter Type: ${node.data.config.chunkingSplitter}`);
      console.log(`Chunk Size: ${node.data.config.chunkSize}`);
      console.log(`Chunk Overlap: ${node.data.config.chunkOverlap}`);
      
      if (node.data.config.elementsToRemove && node.data.config.elementsToRemove.length > 0) {
        console.log(`Elements to Remove: ${node.data.config.elementsToRemove.join(', ')}`);
      }
      
      if (node.data.result) {
        const status = node.data.result.processing ? 'Processing' : 
                      node.data.result.waiting ? 'Waiting' :
                      node.data.result.has_error ? 'Error' :
                      node.data.result.updated ? 'Updated' : 'Needs Update';
        console.log(`Status: ${status}`);
        
        if (node.data.result.total_chunks) {
          console.log(`Total chunks generated: ${node.data.result.total_chunks}`);
        }
      }
    });
  })
  .catch(error => console.error('Error:', error));

Python

import requests
import json

def list_chunking_nodes(flow_name, api_token):
    url = f"https://{flow_name}.flows.graphorlm.com/chunking"
    
    headers = {
        "Authorization": f"Bearer {api_token}"
    }
    
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    
    return response.json()

def analyze_chunking_nodes(chunking_nodes):
    """Analyze chunking nodes and provide detailed summary"""
    print(f"🧩 Chunking Nodes Analysis")
    print(f"Total chunking nodes: {len(chunking_nodes)}")
    print("-" * 50)
    
    embedding_models = {}
    splitter_types = {}
    status_counts = {"updated": 0, "processing": 0, "waiting": 0, "error": 0, "needs_update": 0}
    total_chunks = 0
    
    for node in chunking_nodes:
        node_data = node.get('data', {})
        config = node_data.get('config', {})
        result = node_data.get('result', {})
        
        # Track embedding models
        embedding_model = config.get('embeddingModel', 'Unknown')
        embedding_models[embedding_model] = embedding_models.get(embedding_model, 0) + 1
        
        # Track splitter types  
        splitter_type = config.get('chunkingSplitter', 'Unknown')
        splitter_types[splitter_type] = splitter_types.get(splitter_type, 0) + 1
        
        print(f"\n🔧 Node: {node_data.get('name', 'Unnamed')} ({node['id']})")
        print(f"   Embedding Model: {embedding_model}")
        print(f"   Splitter Type: {splitter_type}")
        print(f"   Chunk Size: {config.get('chunkSize', 'Not set')}")
        print(f"   Chunk Overlap: {config.get('chunkOverlap', 0)}")
        
        elements_to_remove = config.get('elementsToRemove', [])
        if elements_to_remove:
            print(f"   Elements to Remove: {', '.join(elements_to_remove)}")
        
        if result:
            if result.get('processing'):
                status_counts["processing"] += 1
                print("   🔄 Status: Processing")
            elif result.get('waiting'):
                status_counts["waiting"] += 1
                print("   ⏳ Status: Waiting")
            elif result.get('has_error'):
                status_counts["error"] += 1
                print("   ❌ Status: Error")
            elif result.get('updated'):
                status_counts["updated"] += 1
                print("   ✅ Status: Updated")
            else:
                status_counts["needs_update"] += 1
                print("   ⚠️  Status: Needs Update")
                
            if result.get('total_chunks'):
                chunks = result['total_chunks']
                total_chunks += chunks
                print(f"   📄 Total chunks: {chunks}")
    
    print(f"\n📊 Summary:")
    print(f"   Total chunks across all nodes: {total_chunks}")
    
    print(f"\n🤖 Embedding Models:")
    for model, count in embedding_models.items():
        print(f"   {model}: {count} node(s)")
    
    print(f"\n⚡ Splitter Types:")
    for splitter, count in splitter_types.items():
        print(f"   {splitter}: {count} node(s)")
    
    print(f"\n📈 Node Status:")
    for status, count in status_counts.items():
        if count > 0:
            print(f"   {status.replace('_', ' ').title()}: {count}")

# Usage
try:
    chunking_nodes = list_chunking_nodes("my-rag-pipeline", "YOUR_API_TOKEN")
    analyze_chunking_nodes(chunking_nodes)
    
except requests.exceptions.HTTPError as e:
    print(f"Error: {e}")
    if e.response.status_code == 404:
        print("Flow not found or no chunking nodes in this flow")
    elif e.response.status_code == 401:
        print("Invalid API token or insufficient permissions")

cURL

# Basic request
curl -X GET https://my-rag-pipeline.flows.graphorlm.com/chunking \
  -H "Authorization: Bearer YOUR_API_TOKEN"

# With jq for formatted output
curl -X GET https://my-rag-pipeline.flows.graphorlm.com/chunking \
  -H "Authorization: Bearer YOUR_API_TOKEN" | jq '.'

# Extract chunking configuration summary
curl -X GET https://my-rag-pipeline.flows.graphorlm.com/chunking \
  -H "Authorization: Bearer YOUR_API_TOKEN" | \
  jq -r '.[] | "\(.data.name): \(.data.config.embeddingModel) - \(.data.config.chunkingSplitter) (\(.data.config.chunkSize) chars)"'

# Count total chunks across all nodes
curl -X GET https://my-rag-pipeline.flows.graphorlm.com/chunking \
  -H "Authorization: Bearer YOUR_API_TOKEN" | \
  jq '[.[] | .data.result.total_chunks // 0] | add'

PHP

<?php
function listChunkingNodes($flowName, $apiToken) {
    $url = "https://{$flowName}.flows.graphorlm.com/chunking";
    
    $options = [
        'http' => [
            'header' => "Authorization: Bearer {$apiToken}",
            'method' => 'GET'
        ]
    ];
    
    $context = stream_context_create($options);
    $result = file_get_contents($url, false, $context);
    
    if ($result === FALSE) {
        throw new Exception('Failed to retrieve chunking nodes');
    }
    
    return json_decode($result, true);
}

function analyzeChunkingNodes($chunkingNodes) {
    $embeddingModels = [];
    $splitterTypes = [];
    $statusCounts = [
        'updated' => 0,
        'processing' => 0, 
        'waiting' => 0,
        'error' => 0,
        'needs_update' => 0
    ];
    $totalChunks = 0;
    
    echo "🧩 Chunking Nodes Analysis\n";
    echo "Total chunking nodes: " . count($chunkingNodes) . "\n";
    echo str_repeat("-", 50) . "\n";
    
    foreach ($chunkingNodes as $node) {
        $data = $node['data'] ?? [];
        $config = $data['config'] ?? [];
        $result = $data['result'] ?? [];
        
        $embeddingModel = $config['embeddingModel'] ?? 'Unknown';
        $embeddingModels[$embeddingModel] = ($embeddingModels[$embeddingModel] ?? 0) + 1;
        
        $splitterType = $config['chunkingSplitter'] ?? 'Unknown';
        $splitterTypes[$splitterType] = ($splitterTypes[$splitterType] ?? 0) + 1;
        
        echo "\n🔧 Node: " . ($data['name'] ?? 'Unnamed') . " ({$node['id']})\n";
        echo "   Embedding Model: {$embeddingModel}\n";
        echo "   Splitter Type: {$splitterType}\n";
        echo "   Chunk Size: " . ($config['chunkSize'] ?? 'Not set') . "\n";
        echo "   Chunk Overlap: " . ($config['chunkOverlap'] ?? 0) . "\n";
        
        $elementsToRemove = $config['elementsToRemove'] ?? [];
        if (!empty($elementsToRemove)) {
            echo "   Elements to Remove: " . implode(', ', $elementsToRemove) . "\n";
        }
        
        if (!empty($result)) {
            if ($result['processing'] ?? false) {
                $statusCounts['processing']++;
                echo "   🔄 Status: Processing\n";
            } elseif ($result['waiting'] ?? false) {
                $statusCounts['waiting']++;
                echo "   ⏳ Status: Waiting\n";
            } elseif ($result['has_error'] ?? false) {
                $statusCounts['error']++;
                echo "   ❌ Status: Error\n";
            } elseif ($result['updated'] ?? false) {
                $statusCounts['updated']++;
                echo "   ✅ Status: Updated\n";
            } else {
                $statusCounts['needs_update']++;
                echo "   ⚠️  Status: Needs Update\n";
            }
            
            if (isset($result['total_chunks'])) {
                $chunks = $result['total_chunks'];
                $totalChunks += $chunks;
                echo "   📄 Total chunks: {$chunks}\n";
            }
        }
    }
    
    echo "\n📊 Summary:\n";
    echo "   Total chunks across all nodes: {$totalChunks}\n";
    
    echo "\n🤖 Embedding Models:\n";
    foreach ($embeddingModels as $model => $count) {
        echo "   {$model}: {$count} node(s)\n";
    }
    
    echo "\n⚡ Splitter Types:\n";
    foreach ($splitterTypes as $splitter => $count) {
        echo "   {$splitter}: {$count} node(s)\n";
    }
    
    echo "\n📈 Node Status:\n";
    foreach ($statusCounts as $status => $count) {
        if ($count > 0) {
            $statusLabel = ucwords(str_replace('_', ' ', $status));
            echo "   {$statusLabel}: {$count}\n";
        }
    }
}

// Usage
try {
    $chunkingNodes = listChunkingNodes('my-rag-pipeline', 'YOUR_API_TOKEN');
    analyzeChunkingNodes($chunkingNodes);
    
} catch (Exception $e) {
    echo "Error: " . $e->getMessage() . "\n";
}
?>

Error Responses

Common Error Codes

Status CodeDescriptionExample Response
401Unauthorized - Invalid or missing API token{"detail": "Invalid authentication credentials"}
404Not Found - Flow not found{"detail": "Flow not found"}
500Internal Server Error - Server error{"detail": "Failed to retrieve chunking nodes"}

Error Response Format

{
  "detail": "Error message describing what went wrong"
}

Example Error Responses

Invalid API Token

{
  "detail": "Invalid authentication credentials"
}

Flow Not Found

{
  "detail": "Flow not found"
}

Server Error

{
  "detail": "Failed to retrieve chunking nodes"
}

Use Cases

Chunking Node Management

Use this endpoint to:
  • Configuration Review: Examine chunking settings like embedding models and splitter types
  • Performance Monitoring: Check processing status and chunk generation metrics
  • Flow Optimization: Analyze chunking configurations for optimal performance
  • Debugging: Identify issues with chunking node configurations or processing

Integration Examples

Chunking Performance Monitor

class ChunkingPerformanceMonitor {
  constructor(flowName, apiToken) {
    this.flowName = flowName;
    this.apiToken = apiToken;
  }

  async getPerformanceReport() {
    try {
      const nodes = await this.listChunkingNodes();
      const report = {
        totalNodes: nodes.length,
        activeNodes: 0,
        processingNodes: 0,
        errorNodes: 0,
        totalChunks: 0,
        averageChunkSize: 0,
        embeddingModels: {},
        splitterTypes: {},
        performance: []
      };

      let totalChunkSize = 0;
      let nodeCount = 0;

      for (const node of nodes) {
        const config = node.data.config || {};
        const result = node.data.result || {};
        
        // Track embedding models
        const embeddingModel = config.embeddingModel || 'Unknown';
        report.embeddingModels[embeddingModel] = (report.embeddingModels[embeddingModel] || 0) + 1;
        
        // Track splitter types
        const splitterType = config.chunkingSplitter || 'Unknown';
        report.splitterTypes[splitterType] = (report.splitterTypes[splitterType] || 0) + 1;
        
        // Calculate performance metrics
        if (config.chunkSize) {
          totalChunkSize += config.chunkSize;
          nodeCount++;
        }
        
        if (result.total_chunks) {
          report.totalChunks += result.total_chunks;
        }
        
        // Track node status
        if (result.processing) {
          report.processingNodes++;
        } else if (result.has_error) {
          report.errorNodes++;
        } else if (result.updated) {
          report.activeNodes++;
        }
        
        // Individual node performance
        report.performance.push({
          nodeId: node.id,
          nodeName: node.data.name,
          embeddingModel: config.embeddingModel,
          splitterType: config.chunkingSplitter,
          chunkSize: config.chunkSize,
          chunkOverlap: config.chunkOverlap,
          totalChunks: result.total_chunks || 0,
          status: result.processing ? 'Processing' :
                 result.has_error ? 'Error' :
                 result.updated ? 'Active' : 'Inactive'
        });
      }

      if (nodeCount > 0) {
        report.averageChunkSize = Math.round(totalChunkSize / nodeCount);
      }

      return report;
    } catch (error) {
      throw new Error(`Performance report failed: ${error.message}`);
    }
  }

  async listChunkingNodes() {
    const response = await fetch(`https://${this.flowName}.flows.graphorlm.com/chunking`, {
      headers: { 'Authorization': `Bearer ${this.apiToken}` }
    });

    if (!response.ok) {
      throw new Error(`HTTP ${response.status}: ${response.statusText}`);
    }

    return await response.json();
  }

  async generateReport() {
    const report = await this.getPerformanceReport();
    
    console.log('🧩 Chunking Performance Report');
    console.log('==============================');
    console.log(`Total Nodes: ${report.totalNodes}`);
    console.log(`Active Nodes: ${report.activeNodes}`);
    console.log(`Processing Nodes: ${report.processingNodes}`);
    console.log(`Error Nodes: ${report.errorNodes}`);
    console.log(`Total Chunks: ${report.totalChunks}`);
    console.log(`Average Chunk Size: ${report.averageChunkSize} characters`);
    
    console.log('\n🤖 Embedding Models:');
    for (const [model, count] of Object.entries(report.embeddingModels)) {
      console.log(`  ${model}: ${count} node(s)`);
    }
    
    console.log('\n⚡ Splitter Types:');
    for (const [splitter, count] of Object.entries(report.splitterTypes)) {
      console.log(`  ${splitter}: ${count} node(s)`);
    }
    
    console.log('\n📊 Node Performance:');
    report.performance.forEach(node => {
      console.log(`  ${node.nodeName} (${node.nodeId}):`);
      console.log(`    Model: ${node.embeddingModel}, Splitter: ${node.splitterType}`);
      console.log(`    Chunk Size: ${node.chunkSize}, Overlap: ${node.chunkOverlap}`);
      console.log(`    Total Chunks: ${node.totalChunks}, Status: ${node.status}`);
    });

    return report;
  }
}

// Usage
const monitor = new ChunkingPerformanceMonitor('my-rag-pipeline', 'YOUR_API_TOKEN');
monitor.generateReport().catch(console.error);

Configuration Validator

import requests
from typing import List, Dict, Any

class ChunkingConfigValidator:
    def __init__(self, flow_name: str, api_token: str):
        self.flow_name = flow_name
        self.api_token = api_token
        self.base_url = f"https://{flow_name}.flows.graphorlm.com"
    
    def get_chunking_nodes(self) -> List[Dict[str, Any]]:
        """Retrieve all chunking nodes from the flow"""
        response = requests.get(
            f"{self.base_url}/chunking",
            headers={"Authorization": f"Bearer {self.api_token}"}
        )
        response.raise_for_status()
        return response.json()
    
    def validate_configurations(self) -> Dict[str, Any]:
        """Validate chunking node configurations"""
        nodes = self.get_chunking_nodes()
        
        validation_report = {
            "summary": {
                "total_nodes": len(nodes),
                "valid_configs": 0,
                "invalid_configs": 0,
                "warnings": 0
            },
            "nodes": [],
            "issues": []
        }
        
        for node in nodes:
            node_info = {
                "id": node["id"],
                "name": node["data"]["name"],
                "config": node["data"]["config"],
                "is_valid": True,
                "warnings": [],
                "errors": []
            }
            
            config = node["data"]["config"]
            
            # Validate embedding model
            if not config.get("embeddingModel"):
                node_info["errors"].append("Missing embedding model")
                node_info["is_valid"] = False
            
            # Validate chunk size
            chunk_size = config.get("chunkSize")
            splitter_type = config.get("chunkingSplitter")
            
            if splitter_type in ["character", "token"] and not chunk_size:
                node_info["errors"].append(f"Chunk size required for {splitter_type} splitter")
                node_info["is_valid"] = False
            elif chunk_size and chunk_size <= 0:
                node_info["errors"].append("Chunk size must be greater than 0")
                node_info["is_valid"] = False
            elif chunk_size and chunk_size > 8000:
                node_info["warnings"].append("Large chunk size may affect performance")
            
            # Validate chunk overlap
            chunk_overlap = config.get("chunkOverlap", 0)
            if chunk_size and chunk_overlap >= chunk_size:
                node_info["errors"].append("Chunk overlap cannot be greater than or equal to chunk size")
                node_info["is_valid"] = False
            elif chunk_overlap and chunk_overlap > chunk_size * 0.5:
                node_info["warnings"].append("High chunk overlap may cause redundancy")
            
            # Validate splitter type
            valid_splitters = ["character", "token", "semantic", "title", "element"]
            if splitter_type not in valid_splitters:
                node_info["errors"].append(f"Invalid splitter type: {splitter_type}")
                node_info["is_valid"] = False
            
            # Count valid/invalid configs
            if node_info["is_valid"]:
                validation_report["summary"]["valid_configs"] += 1
            else:
                validation_report["summary"]["invalid_configs"] += 1
            
            if node_info["warnings"]:
                validation_report["summary"]["warnings"] += len(node_info["warnings"])
            
            # Add issues to global issues list
            for error in node_info["errors"]:
                validation_report["issues"].append({
                    "type": "error",
                    "node_id": node["id"],
                    "node_name": node_info["name"],
                    "message": error
                })
            
            for warning in node_info["warnings"]:
                validation_report["issues"].append({
                    "type": "warning",
                    "node_id": node["id"],
                    "node_name": node_info["name"],
                    "message": warning
                })
            
            validation_report["nodes"].append(node_info)
        
        return validation_report
    
    def print_validation_report(self, report: Dict[str, Any]):
        """Print a formatted validation report"""
        summary = report["summary"]
        
        print("🔍 Chunking Configuration Validation Report")
        print("=" * 50)
        print(f"Flow: {self.flow_name}")
        print(f"Total Nodes: {summary['total_nodes']}")
        print(f"Valid Configurations: {summary['valid_configs']}")
        print(f"Invalid Configurations: {summary['invalid_configs']}")
        print(f"Warnings: {summary['warnings']}")
        
        if summary['invalid_configs'] == 0 and summary['warnings'] == 0:
            print("\n✅ All chunking configurations are valid!")
            return
        
        print(f"\n📋 Node Details:")
        print("-" * 30)
        for node in report["nodes"]:
            status_icon = "✅" if node["is_valid"] else "❌"
            warning_icon = "⚠️" if node["warnings"] else ""
            
            print(f"\n{status_icon} {warning_icon} {node['name']} ({node['id']})")
            
            config = node["config"]
            print(f"   Embedding Model: {config.get('embeddingModel', 'Not set')}")
            print(f"   Splitter: {config.get('chunkingSplitter', 'Not set')}")
            print(f"   Chunk Size: {config.get('chunkSize', 'Not set')}")
            print(f"   Chunk Overlap: {config.get('chunkOverlap', 0)}")
            
            for error in node["errors"]:
                print(f"   ❌ Error: {error}")
            
            for warning in node["warnings"]:
                print(f"   ⚠️  Warning: {warning}")

# Usage
validator = ChunkingConfigValidator("my-rag-pipeline", "YOUR_API_TOKEN")
try:
    report = validator.validate_configurations()
    validator.print_validation_report(report)
except Exception as e:
    print(f"Validation failed: {e}")

Best Practices

Configuration Management

  • Optimal Chunk Size: Choose chunk sizes between 500-2000 characters for most use cases
  • Appropriate Overlap: Use 10-20% overlap to maintain context between chunks
  • Embedding Model Selection: Choose embedding models based on your language and domain requirements
  • Splitter Selection: Use “semantic” splitters for better content preservation when possible

Performance Optimization

  • Monitor Processing: Regularly check node status to ensure chunking is completing successfully
  • Batch Processing: For large documents, consider the processing time implications
  • Resource Management: Balance chunk size and overlap with processing performance
  • Error Handling: Implement proper error handling for chunking failures

Monitoring and Maintenance

  • Regular Health Checks: Monitor chunking nodes to ensure they’re processing correctly
  • Configuration Validation: Verify that chunking settings are appropriate for your content
  • Performance Tracking: Monitor chunk generation metrics and processing times
  • Update Coordination: Coordinate chunking updates with downstream processing nodes

Troubleshooting

Next Steps

After retrieving chunking node information, you might want to: