Overview
The Update LLM Configuration endpoint allows you to modify LLM node settings within a flow. LLM nodes handle the critical task of response generation, taking retrieved context and transforming it into coherent, accurate answers using advanced language models with configurable parameters for optimal performance.- Method:
PATCH - URL:
https://{flow_name}.flows.graphorlm.com/llm/{node_id} - Authentication: Required (API Token)
Authentication
All requests must include a valid API token in the Authorization header:Learn how to generate API tokens in the API Tokens guide.
Request Format
Headers
| Header | Value | Required |
|---|---|---|
Authorization | Bearer YOUR_API_TOKEN | Yes |
Content-Type | application/json | Yes |
URL Parameters
| Parameter | Type | Description |
|---|---|---|
flow_name | string | Name of the flow containing the LLM node |
node_id | string | Unique identifier of the LLM node to update |
Request Body
Configuration Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | No | - | LLM model to use for response generation |
promptId | string | No | - | ID of the prompt template for instruction guidance |
temperature | float | No | 0.0 | Temperature control for response creativity (0.0-2.0) |
Available Models
OpenAI Models
| Model | Context Window | Best For | Performance Tier |
|---|---|---|---|
gpt-4o | 128,000 tokens | Complex reasoning, high-quality responses | Premium |
gpt-4o-mini | 128,000 tokens | Fast responses with good quality | Balanced |
gpt-4.1 | 128,000 tokens | Latest capabilities, enhanced reasoning | Premium |
gpt-4.1-mini | 128,000 tokens | Efficient processing with modern features | Balanced |
gpt-4.1-nano | 128,000 tokens | Ultra-fast responses, lightweight processing | Efficient |
gpt-3.5-turbo-0125 | 16,385 tokens | Quick responses, resource-efficient | Efficient |
Groq Models (High-Speed Processing)
| Model | Context Window | Best For | Performance Tier |
|---|---|---|---|
mixtral-8x7b-32768 | 32,768 tokens | High-throughput processing | High-Speed |
llama-3.1-8b-instant | 8,192 tokens | Ultra-fast responses | High-Speed |
Temperature Control
| Range | Behavior | Use Cases |
|---|---|---|
0.0 | Deterministic, consistent responses | Technical documentation, factual Q&A |
0.1-0.3 | Slightly varied, mostly consistent | Customer support, structured responses |
0.4-0.7 | Balanced creativity and consistency | General conversation, explanations |
0.8-1.2 | Creative, diverse responses | Content generation, brainstorming |
1.3-2.0 | Highly creative, unpredictable | Creative writing, experimental responses |
Example Request
Response Format
Success Response (200 OK)
Response Structure
| Field | Type | Description |
|---|---|---|
success | boolean | Whether the update was successful |
message | string | Descriptive message about the update result |
node_id | string | ID of the updated LLM node |
Code Examples
JavaScript/Node.js
Python
cURL
Configuration Strategies
Maximum Accuracy Strategy
Optimal for: Technical documentation, factual Q&A, compliance requirements- Deterministic responses for consistent results
- Premium model quality with advanced reasoning
- Zero creativity for maximum factual accuracy
- Expected latency: 2-4 seconds
- Context capacity: 128,000 tokens
Balanced Performance Strategy
Optimal for: General Q&A, customer support, mixed content types- Good quality with efficiency balance
- Slight response variation while maintaining consistency
- Versatile processing for diverse use cases
- Expected latency: 1-2 seconds
- Context capacity: 128,000 tokens
High-Throughput Strategy
Optimal for: Real-time chat, high-volume processing, instant responses- Ultra-fast processing with Groq acceleration
- High throughput capacity for concurrent requests
- Real-time response generation for interactive applications
- Expected latency: 0.5-1 second
- Context capacity: 32,768 tokens
Creative Generation Strategy
Optimal for: Content creation, brainstorming, diverse outputs- Enhanced creativity with latest model capabilities
- Diverse response generation for varied outputs
- Advanced reasoning with creative flexibility
- Expected latency: 2-5 seconds
- Context capacity: 128,000 tokens
Resource-Efficient Strategy
Optimal for: Budget-conscious applications, simple Q&A, high-scale deployment- Optimized resource usage with minimal processing overhead
- Fast response times with good quality retention
- High scalability for large-scale deployments
- Expected latency: 0.8-1.5 seconds
- Context capacity: 128,000 tokens
Strategy Selection Matrix
| Use Case | Accuracy Priority | Speed Priority | Resource Efficiency | Recommended Strategy |
|---|---|---|---|---|
| Technical Documentation | High | Medium | Medium | Maximum Accuracy |
| Customer Support | Medium | High | Medium | High-Throughput |
| General Q&A | Medium | Medium | High | Balanced Performance |
| Content Creation | Medium | Low | Low | Creative Generation |
| Real-time Chat | Low | Very High | High | High-Throughput |
| Budget Applications | Medium | Medium | Very High | Resource-Efficient |
Error Responses
Common Error Codes
| Status Code | Description | Example Response |
|---|---|---|
| 400 | Bad Request - Invalid configuration | {"detail": "Invalid temperature value"} |
| 401 | Unauthorized - Invalid or missing API token | {"detail": "Invalid authentication credentials"} |
| 404 | Not Found - Flow or node not found | {"detail": "LLM node with id 'invalid-id' not found"} |
| 422 | Unprocessable Entity - Validation error | {"detail": "Unknown model: invalid-model"} |
| 500 | Internal Server Error - Server error | {"detail": "Failed to update LLM node"} |
Error Response Format
Example Error Responses
Invalid Model
Invalid Temperature
Node Not Found
Invalid Prompt ID
Best Practices
Model Selection Guidelines
- Premium Quality: Use
gpt-4oorgpt-4.1for complex reasoning and highest accuracy - Balanced Approach: Choose
gpt-4o-miniorgpt-4.1-minifor versatile applications - Speed Optimization: Select
mixtral-8x7b-32768orllama-3.1-8b-instantfor real-time processing - Resource Efficiency: Opt for
gpt-4.1-nanoorgpt-3.5-turbo-0125for high-volume deployment
Temperature Configuration
- Factual Content (0.0-0.1): Technical documentation, compliance, precise answers
- Professional Responses (0.1-0.3): Customer support, structured explanations
- Conversational (0.3-0.5): General Q&A, interactive applications
- Creative Content (0.5-1.0): Content generation, brainstorming, diverse outputs
- Experimental (1.0-2.0): Research, creative writing, novel approaches
Prompt Template Selection
- Default RAG: Use
default_retrieval_promptfor general-purpose applications - Technical Focus: Select
technical_documentation_assistantfor specialized content - Customer Support: Choose
customer_support_agentfor service applications - Creative Content: Opt for
creative_content_generatorfor diverse outputs
Performance Optimization
- Context Management: Choose models with appropriate context windows for your content
- Latency Requirements: Balance model quality with response time needs
- Throughput Planning: Consider concurrent request patterns when selecting models
- Resource Monitoring: Track processing patterns and adjust configurations accordingly
Troubleshooting
Node Not Found Error
Node Not Found Error
Solution: Verify that:
- The node ID is correct and exists in the specified flow
- The node is indeed an LLM type node
- You have access to the flow and node
- The flow name in the URL matches exactly
Invalid Model Configuration
Invalid Model Configuration
Solution: If model configuration fails:
- Check that the model name is exactly as specified in available models
- Verify that Groq models require
GROQ_API_KEYenvironment variable - Ensure the model is supported in your deployment region
- Confirm model availability hasn’t changed
Temperature Parameter Issues
Temperature Parameter Issues
Solution: For temperature configuration problems:
- Ensure temperature is between 0.0 and 2.0
- Use appropriate precision (e.g., 0.2, not 0.2000001)
- Consider that higher temperatures increase response variation
- Test temperature effects with your specific use case
Prompt Template Errors
Prompt Template Errors
Solution: If prompt template assignment fails:
- Verify the prompt ID exists in your flow’s available prompts
- Check that the prompt template is properly formatted
- Ensure the prompt includes necessary placeholders (e.g.,
{context}) - Confirm prompt template compatibility with your use case
Processing Performance Issues
Processing Performance Issues
Solution: For slow or inconsistent processing:
- Consider switching to faster models for better latency
- Adjust temperature to reduce processing complexity
- Monitor context window usage and optimize input size
- Check for concurrent request limits and throttling
Response Quality Problems
Response Quality Problems
Solution: If response quality is poor:
- Lower temperature for more consistent, factual responses
- Switch to higher-quality models (gpt-4o, gpt-4.1)
- Review and optimize prompt template instructions
- Ensure context provided to LLM is relevant and well-formatted
Connection Issues
Connection Issues
Solution: For connectivity problems:
- Check your internet connection
- Verify the flow URL is accessible
- Ensure your firewall allows HTTPS traffic to *.flows.graphorlm.com
- Try accessing the endpoint from a different network
Next Steps
After updating your LLM configuration, you might want to:List LLM Nodes
View your updated LLM nodes and verify configuration changes
List Prompts
Explore available prompt templates for further optimization
Run Flow
Execute your flow with the updated LLM configuration
Flow Overview
Learn about all available flow management endpoints

