Dataset Endpoints Overview
Comprehensive guide to managing dataset nodes in flows via the GraphorLM REST API
Dataset endpoints allow you to manage the data components of your flows in GraphorLM. Dataset nodes are the entry points that connect your uploaded documents to the RAG pipeline, determining which files are processed and how they’re configured.
What are Dataset Nodes?
Dataset nodes are fundamental components in GraphorLM flows that:
- Connect Sources to Flows: Link your uploaded documents to processing pipelines
- Control Data Input: Determine which files are included in each dataset
- Enable Configuration: Allow customization of how documents are processed
- Support Flow Logic: Act as the starting point for RAG pipelines
Available Endpoints
GraphorLM provides comprehensive REST API endpoints for dataset management:
List Dataset Nodes
Retrieve all dataset nodes from a specific flow with their configurations and status
Update Dataset
Modify dataset node configurations to change which files are included
Core Concepts
Dataset Node Structure
Each dataset node contains:
Key Components
Component | Description |
---|---|
ID | Unique identifier for the dataset node |
Config | File selection and processing settings |
Result | Status and metadata from last processing |
Position | Visual placement in the flow editor |
File Management
Dataset nodes manage file relationships through:
- File Selection: Choose which uploaded sources to include
- Dynamic Updates: Add or remove files without recreating the node
- Validation: Ensure all specified files exist as sources
- Status Tracking: Monitor processing state and updates needed
Common Workflows
Setting Up a New Dataset
- List Available Nodes: Use the List Dataset Nodes endpoint to see existing datasets
- Update Configuration: Use the Update Dataset endpoint to configure file selection
- Deploy Flow: Deploy the flow to apply changes and make it executable
Managing Dataset Files
Monitoring Dataset Status
Authentication
All dataset endpoints require authentication via API tokens:
Learn how to generate and manage API tokens in the API Tokens guide.
URL Structure
Dataset endpoints follow a consistent URL pattern:
Where:
{flow_name}
: The name of your deployed flow{node_id}
: The specific dataset node identifier (for update operations)
Response Formats
Dataset Node Object
All endpoints return dataset nodes with this structure:
Success Responses
Update operations return confirmation:
Error Handling
Common error scenarios and responses:
Error Type | HTTP Status | Description |
---|---|---|
Authentication | 401 | Invalid or missing API token |
Flow Not Found | 404 | Flow doesn’t exist or isn’t accessible |
Node Not Found | 404 | Dataset node doesn’t exist in the flow |
Files Not Found | 400 | Specified files don’t exist as sources |
Invalid Config | 400 | Configuration validation failed |
Server Error | 500 | Internal processing error |
Error Response Format
Integration Examples
Dataset Management Class
Batch Operations
Best Practices
Configuration Management
- Version Control: Keep track of dataset configurations over time
- Validation: Always verify files exist before updating configurations
- Atomic Updates: Update entire file lists rather than partial modifications
- Documentation: Document the purpose of each dataset node
Performance Optimization
- File Organization: Group related files in the same dataset nodes
- Batch Operations: Update multiple nodes efficiently using batch processing
- Monitoring: Regularly check dataset status to identify issues early
- Caching: Cache dataset node information to reduce API calls
Error Prevention
- File Validation: Use the List Sources endpoint to verify file availability
- Status Monitoring: Check node update status before flow deployment
- Graceful Degradation: Handle missing files and failed updates appropriately
- Retry Logic: Implement retry mechanisms for transient failures
Relationship with Other Endpoints
Dataset endpoints work closely with other GraphorLM APIs:
Source Management
- Use Sources endpoints to upload and manage files
- Validate file availability before updating dataset configurations
Flow Management
- Use Flow endpoints to deploy updated configurations
- Monitor flow status after dataset changes
Flow Execution
- Use Run Flow endpoint to execute flows with updated datasets
- Verify results reflect the new dataset configurations
Use Cases
Content Management System
Dynamic Content Updates
Quality Assurance
Migration and Maintenance
Upgrading Dataset Configurations
When migrating or updating your flow configurations:
- Backup Current State: List all current dataset configurations
- Plan Updates: Design new file organization structure
- Test Changes: Update configurations in a development environment
- Apply Updates: Use batch operations to update production datasets
- Verify Results: Check that all nodes are properly updated
Monitoring and Alerting
Set up monitoring for dataset health:
Next Steps
Ready to start managing your dataset nodes? Here are the next steps:
List Dataset Nodes
Start by exploring existing dataset nodes in your flows
Update Dataset
Learn how to modify dataset configurations
Sources Overview
Manage the source files that power your datasets
Deploy Flow
Deploy your updated flows to make changes active
For more advanced usage patterns and integration examples, explore our comprehensive guides: