Key Features
- Multiple Models: Access to 8 language models including GPT-4o, GPT-4.1, and high-speed Groq models
- Prompt Integration: Custom prompt templates for specialized response behaviors
- Temperature Control: Adjust creativity from deterministic (0.0) to creative (2.0) responses
- Streaming Support: Real-time response generation for interactive applications
- Quality Metrics: Built-in evaluation using DeepEval metrics
Available Endpoints
List LLM Nodes
GET
/{flow_name}/llmRetrieve all LLM nodes with configurations and metricsUpdate Configuration
PATCH
/{flow_name}/llm/{node_id}Modify LLM node settings including model and temperatureList Prompts
GET
/{flow_name}/promptsAccess available prompt templates for LLM customizationLLM Node Structure
Configuration Parameters
| Parameter | Type | Description |
|---|---|---|
| model | string | Language model for response generation |
| promptId | string | Prompt template for instruction guidance |
| temperature | float (0.0-2.0) | Creativity and randomness control |
Available Models
| Model | Context Window | Best For | Expected Latency |
|---|---|---|---|
| gpt-4o | 128K tokens | High accuracy, complex reasoning | 2-4 seconds |
| gpt-4o-mini | 128K tokens | Balanced quality and speed | 1-2 seconds |
| gpt-4.1 | 128K tokens | Latest capabilities | 2-5 seconds |
| gpt-4.1-mini | 128K tokens | Modern features, efficient | 1-2 seconds |
| gpt-4.1-nano | 128K tokens | Resource optimization | 0.8-1.5 seconds |
| gpt-3.5-turbo-0125 | 16K tokens | High-volume processing | 0.5-1 second |
| mixtral-8x7b-32768 | 32K tokens | Real-time processing | 0.5-1 second |
| llama-3.1-8b-instant | 8K tokens | Ultra-fast responses | 0.3-0.8 seconds |
Common Configurations
Maximum Accuracy
For technical documentation and critical applications:Balanced Performance
For general Q&A and customer support:High-Speed Processing
For real-time chat and instant responses:Creative Generation
For content creation and diverse outputs:Quality Metrics
LLM nodes track response quality using DeepEval metrics:- Contextual Precision: Accuracy of context usage
- Contextual Recall: Completeness of context utilization
- Answer Relevancy: Relevance of response to question
- Faithfulness: Adherence to context without hallucination
JavaScript Example
Python Example
Best Practices
Model Selection
- Use gpt-4o for maximum accuracy in critical applications
- Use gpt-4o-mini for balanced performance in general use cases
- Use Groq models (mixtral, llama) for real-time applications
- Use creative models (gpt-4.1) for content generation
Temperature Settings
- 0.0-0.1: Deterministic responses for factual queries
- 0.2-0.4: Slight variation for natural conversations
- 0.5-0.8: Creative responses for content generation
- 0.9-2.0: Highly creative and diverse outputs
Performance Optimization
- Monitor processing time and adjust model selection accordingly
- Enable streaming for better user experience with longer responses
- Use appropriate context window sizes for your content
- Track quality metrics to ensure response accuracy
Troubleshooting
Common issues and solutions:- Slow responses: Switch to faster models (Groq) or lower temperature
- Inconsistent quality: Use higher-quality models or lower temperature
- Resource usage: Use efficient models (nano, mini) for high-volume processing
- Poor context usage: Optimize prompt templates and verify context formatting
Next Steps
List LLM Nodes
Explore your current LLM configurations and performance
Update Configuration
Optimize LLM settings for your use case
Manage Prompts
Customize prompt templates for better responses
Run Flows
Test your optimized LLM configurations

