Key Features
- Multiple Models: Access to 8 language models including GPT-4o, GPT-4.1, and high-speed Groq models
 - Prompt Integration: Custom prompt templates for specialized response behaviors
 - Temperature Control: Adjust creativity from deterministic (0.0) to creative (2.0) responses
 - Streaming Support: Real-time response generation for interactive applications
 - Quality Metrics: Built-in evaluation using DeepEval metrics
 
Available Endpoints
List LLM Nodes
GET 
/{flow_name}/llmRetrieve all LLM nodes with configurations and metricsUpdate Configuration
PATCH 
/{flow_name}/llm/{node_id}Modify LLM node settings including model and temperatureList Prompts
GET 
/{flow_name}/promptsAccess available prompt templates for LLM customizationLLM Node Structure
Configuration Parameters
| Parameter | Type | Description | 
|---|---|---|
| model | string | Language model for response generation | 
| promptId | string | Prompt template for instruction guidance | 
| temperature | float (0.0-2.0) | Creativity and randomness control | 
Available Models
| Model | Context Window | Best For | Expected Latency | 
|---|---|---|---|
| gpt-4o | 128K tokens | High accuracy, complex reasoning | 2-4 seconds | 
| gpt-4o-mini | 128K tokens | Balanced quality and speed | 1-2 seconds | 
| gpt-4.1 | 128K tokens | Latest capabilities | 2-5 seconds | 
| gpt-4.1-mini | 128K tokens | Modern features, efficient | 1-2 seconds | 
| gpt-4.1-nano | 128K tokens | Resource optimization | 0.8-1.5 seconds | 
| gpt-3.5-turbo-0125 | 16K tokens | High-volume processing | 0.5-1 second | 
| mixtral-8x7b-32768 | 32K tokens | Real-time processing | 0.5-1 second | 
| llama-3.1-8b-instant | 8K tokens | Ultra-fast responses | 0.3-0.8 seconds | 
Common Configurations
Maximum Accuracy
For technical documentation and critical applications:Balanced Performance
For general Q&A and customer support:High-Speed Processing
For real-time chat and instant responses:Creative Generation
For content creation and diverse outputs:Quality Metrics
LLM nodes track response quality using DeepEval metrics:- Contextual Precision: Accuracy of context usage
 - Contextual Recall: Completeness of context utilization
 - Answer Relevancy: Relevance of response to question
 - Faithfulness: Adherence to context without hallucination
 
JavaScript Example
Python Example
Best Practices
Model Selection
- Use gpt-4o for maximum accuracy in critical applications
 - Use gpt-4o-mini for balanced performance in general use cases
 - Use Groq models (mixtral, llama) for real-time applications
 - Use creative models (gpt-4.1) for content generation
 
Temperature Settings
- 0.0-0.1: Deterministic responses for factual queries
 - 0.2-0.4: Slight variation for natural conversations
 - 0.5-0.8: Creative responses for content generation
 - 0.9-2.0: Highly creative and diverse outputs
 
Performance Optimization
- Monitor processing time and adjust model selection accordingly
 - Enable streaming for better user experience with longer responses
 - Use appropriate context window sizes for your content
 - Track quality metrics to ensure response accuracy
 
Troubleshooting
Common issues and solutions:- Slow responses: Switch to faster models (Groq) or lower temperature
 - Inconsistent quality: Use higher-quality models or lower temperature
 - Resource usage: Use efficient models (nano, mini) for high-volume processing
 - Poor context usage: Optimize prompt templates and verify context formatting
 

