Overview
The Update LLM Configuration endpoint allows you to modify LLM node settings within a flow. LLM nodes handle the critical task of response generation, taking retrieved context and transforming it into coherent, accurate answers using advanced language models with configurable parameters for optimal performance.- Method:
PATCH
- URL:
https://{flow_name}.flows.graphorlm.com/llm/{node_id}
- Authentication: Required (API Token)
Authentication
All requests must include a valid API token in the Authorization header:Learn how to generate API tokens in the API Tokens guide.
Request Format
Headers
Header | Value | Required |
---|---|---|
Authorization | Bearer YOUR_API_TOKEN | Yes |
Content-Type | application/json | Yes |
URL Parameters
Parameter | Type | Description |
---|---|---|
flow_name | string | Name of the flow containing the LLM node |
node_id | string | Unique identifier of the LLM node to update |
Request Body
Configuration Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
model | string | No | - | LLM model to use for response generation |
promptId | string | No | - | ID of the prompt template for instruction guidance |
temperature | float | No | 0.0 | Temperature control for response creativity (0.0-2.0) |
Available Models
OpenAI Models
Model | Context Window | Best For | Performance Tier |
---|---|---|---|
gpt-4o | 128,000 tokens | Complex reasoning, high-quality responses | Premium |
gpt-4o-mini | 128,000 tokens | Fast responses with good quality | Balanced |
gpt-4.1 | 128,000 tokens | Latest capabilities, enhanced reasoning | Premium |
gpt-4.1-mini | 128,000 tokens | Efficient processing with modern features | Balanced |
gpt-4.1-nano | 128,000 tokens | Ultra-fast responses, lightweight processing | Efficient |
gpt-3.5-turbo-0125 | 16,385 tokens | Quick responses, resource-efficient | Efficient |
Groq Models (High-Speed Processing)
Model | Context Window | Best For | Performance Tier |
---|---|---|---|
mixtral-8x7b-32768 | 32,768 tokens | High-throughput processing | High-Speed |
llama-3.1-8b-instant | 8,192 tokens | Ultra-fast responses | High-Speed |
Temperature Control
Range | Behavior | Use Cases |
---|---|---|
0.0 | Deterministic, consistent responses | Technical documentation, factual Q&A |
0.1-0.3 | Slightly varied, mostly consistent | Customer support, structured responses |
0.4-0.7 | Balanced creativity and consistency | General conversation, explanations |
0.8-1.2 | Creative, diverse responses | Content generation, brainstorming |
1.3-2.0 | Highly creative, unpredictable | Creative writing, experimental responses |
Example Request
Response Format
Success Response (200 OK)
Response Structure
Field | Type | Description |
---|---|---|
success | boolean | Whether the update was successful |
message | string | Descriptive message about the update result |
node_id | string | ID of the updated LLM node |
Code Examples
JavaScript/Node.js
Python
cURL
PHP
Configuration Strategies
Maximum Accuracy Strategy
Optimal for: Technical documentation, factual Q&A, compliance requirements- Deterministic responses for consistent results
- Premium model quality with advanced reasoning
- Zero creativity for maximum factual accuracy
- Expected latency: 2-4 seconds
- Context capacity: 128,000 tokens
Balanced Performance Strategy
Optimal for: General Q&A, customer support, mixed content types- Good quality with efficiency balance
- Slight response variation while maintaining consistency
- Versatile processing for diverse use cases
- Expected latency: 1-2 seconds
- Context capacity: 128,000 tokens
High-Throughput Strategy
Optimal for: Real-time chat, high-volume processing, instant responses- Ultra-fast processing with Groq acceleration
- High throughput capacity for concurrent requests
- Real-time response generation for interactive applications
- Expected latency: 0.5-1 second
- Context capacity: 32,768 tokens
Creative Generation Strategy
Optimal for: Content creation, brainstorming, diverse outputs- Enhanced creativity with latest model capabilities
- Diverse response generation for varied outputs
- Advanced reasoning with creative flexibility
- Expected latency: 2-5 seconds
- Context capacity: 128,000 tokens
Resource-Efficient Strategy
Optimal for: Budget-conscious applications, simple Q&A, high-scale deployment- Optimized resource usage with minimal processing overhead
- Fast response times with good quality retention
- High scalability for large-scale deployments
- Expected latency: 0.8-1.5 seconds
- Context capacity: 128,000 tokens
Strategy Selection Matrix
Use Case | Accuracy Priority | Speed Priority | Resource Efficiency | Recommended Strategy |
---|---|---|---|---|
Technical Documentation | High | Medium | Medium | Maximum Accuracy |
Customer Support | Medium | High | Medium | High-Throughput |
General Q&A | Medium | Medium | High | Balanced Performance |
Content Creation | Medium | Low | Low | Creative Generation |
Real-time Chat | Low | Very High | High | High-Throughput |
Budget Applications | Medium | Medium | Very High | Resource-Efficient |
Error Responses
Common Error Codes
Status Code | Description | Example Response |
---|---|---|
400 | Bad Request - Invalid configuration | {"detail": "Invalid temperature value"} |
401 | Unauthorized - Invalid or missing API token | {"detail": "Invalid authentication credentials"} |
404 | Not Found - Flow or node not found | {"detail": "LLM node with id 'invalid-id' not found"} |
422 | Unprocessable Entity - Validation error | {"detail": "Unknown model: invalid-model"} |
500 | Internal Server Error - Server error | {"detail": "Failed to update LLM node"} |
Error Response Format
Example Error Responses
Invalid Model
Invalid Temperature
Node Not Found
Invalid Prompt ID
Best Practices
Model Selection Guidelines
- Premium Quality: Use
gpt-4o
orgpt-4.1
for complex reasoning and highest accuracy - Balanced Approach: Choose
gpt-4o-mini
orgpt-4.1-mini
for versatile applications - Speed Optimization: Select
mixtral-8x7b-32768
orllama-3.1-8b-instant
for real-time processing - Resource Efficiency: Opt for
gpt-4.1-nano
orgpt-3.5-turbo-0125
for high-volume deployment
Temperature Configuration
- Factual Content (0.0-0.1): Technical documentation, compliance, precise answers
- Professional Responses (0.1-0.3): Customer support, structured explanations
- Conversational (0.3-0.5): General Q&A, interactive applications
- Creative Content (0.5-1.0): Content generation, brainstorming, diverse outputs
- Experimental (1.0-2.0): Research, creative writing, novel approaches
Prompt Template Selection
- Default RAG: Use
default_retrieval_prompt
for general-purpose applications - Technical Focus: Select
technical_documentation_assistant
for specialized content - Customer Support: Choose
customer_support_agent
for service applications - Creative Content: Opt for
creative_content_generator
for diverse outputs
Performance Optimization
- Context Management: Choose models with appropriate context windows for your content
- Latency Requirements: Balance model quality with response time needs
- Throughput Planning: Consider concurrent request patterns when selecting models
- Resource Monitoring: Track processing patterns and adjust configurations accordingly
Troubleshooting
Node Not Found Error
Node Not Found Error
Solution: Verify that:
- The node ID is correct and exists in the specified flow
- The node is indeed an LLM type node
- You have access to the flow and node
- The flow name in the URL matches exactly
Invalid Model Configuration
Invalid Model Configuration
Solution: If model configuration fails:
- Check that the model name is exactly as specified in available models
- Verify that Groq models require
GROQ_API_KEY
environment variable - Ensure the model is supported in your deployment region
- Confirm model availability hasn’t changed
Temperature Parameter Issues
Temperature Parameter Issues
Solution: For temperature configuration problems:
- Ensure temperature is between 0.0 and 2.0
- Use appropriate precision (e.g., 0.2, not 0.2000001)
- Consider that higher temperatures increase response variation
- Test temperature effects with your specific use case
Prompt Template Errors
Prompt Template Errors
Solution: If prompt template assignment fails:
- Verify the prompt ID exists in your flow’s available prompts
- Check that the prompt template is properly formatted
- Ensure the prompt includes necessary placeholders (e.g.,
{context}
) - Confirm prompt template compatibility with your use case
Processing Performance Issues
Processing Performance Issues
Solution: For slow or inconsistent processing:
- Consider switching to faster models for better latency
- Adjust temperature to reduce processing complexity
- Monitor context window usage and optimize input size
- Check for concurrent request limits and throttling
Response Quality Problems
Response Quality Problems
Solution: If response quality is poor:
- Lower temperature for more consistent, factual responses
- Switch to higher-quality models (gpt-4o, gpt-4.1)
- Review and optimize prompt template instructions
- Ensure context provided to LLM is relevant and well-formatted
Connection Issues
Connection Issues
Solution: For connectivity problems:
- Check your internet connection
- Verify the flow URL is accessible
- Ensure your firewall allows HTTPS traffic to *.flows.graphorlm.com
- Try accessing the endpoint from a different network