Overview
The Update LLM Configuration endpoint allows you to modify LLM node settings within a flow. LLM nodes handle the critical task of response generation, taking retrieved context and transforming it into coherent, accurate answers using advanced language models with configurable parameters for optimal performance.- Method: 
PATCH - URL: 
https://{flow_name}.flows.graphorlm.com/llm/{node_id} - Authentication: Required (API Token)
 
Authentication
All requests must include a valid API token in the Authorization header:Learn how to generate API tokens in the API Tokens guide.
Request Format
Headers
| Header | Value | Required | 
|---|---|---|
Authorization | Bearer YOUR_API_TOKEN | Yes | 
Content-Type | application/json | Yes | 
URL Parameters
| Parameter | Type | Description | 
|---|---|---|
flow_name | string | Name of the flow containing the LLM node | 
node_id | string | Unique identifier of the LLM node to update | 
Request Body
Configuration Parameters
| Parameter | Type | Required | Default | Description | 
|---|---|---|---|---|
model | string | No | - | LLM model to use for response generation | 
promptId | string | No | - | ID of the prompt template for instruction guidance | 
temperature | float | No | 0.0 | Temperature control for response creativity (0.0-2.0) | 
Available Models
OpenAI Models
| Model | Context Window | Best For | Performance Tier | 
|---|---|---|---|
gpt-4o | 128,000 tokens | Complex reasoning, high-quality responses | Premium | 
gpt-4o-mini | 128,000 tokens | Fast responses with good quality | Balanced | 
gpt-4.1 | 128,000 tokens | Latest capabilities, enhanced reasoning | Premium | 
gpt-4.1-mini | 128,000 tokens | Efficient processing with modern features | Balanced | 
gpt-4.1-nano | 128,000 tokens | Ultra-fast responses, lightweight processing | Efficient | 
gpt-3.5-turbo-0125 | 16,385 tokens | Quick responses, resource-efficient | Efficient | 
Groq Models (High-Speed Processing)
| Model | Context Window | Best For | Performance Tier | 
|---|---|---|---|
mixtral-8x7b-32768 | 32,768 tokens | High-throughput processing | High-Speed | 
llama-3.1-8b-instant | 8,192 tokens | Ultra-fast responses | High-Speed | 
Temperature Control
| Range | Behavior | Use Cases | 
|---|---|---|
0.0 | Deterministic, consistent responses | Technical documentation, factual Q&A | 
0.1-0.3 | Slightly varied, mostly consistent | Customer support, structured responses | 
0.4-0.7 | Balanced creativity and consistency | General conversation, explanations | 
0.8-1.2 | Creative, diverse responses | Content generation, brainstorming | 
1.3-2.0 | Highly creative, unpredictable | Creative writing, experimental responses | 
Example Request
Response Format
Success Response (200 OK)
Response Structure
| Field | Type | Description | 
|---|---|---|
success | boolean | Whether the update was successful | 
message | string | Descriptive message about the update result | 
node_id | string | ID of the updated LLM node | 
Code Examples
JavaScript/Node.js
Python
cURL
PHP
Configuration Strategies
Maximum Accuracy Strategy
Optimal for: Technical documentation, factual Q&A, compliance requirements- Deterministic responses for consistent results
 - Premium model quality with advanced reasoning
 - Zero creativity for maximum factual accuracy
 - Expected latency: 2-4 seconds
 - Context capacity: 128,000 tokens
 
Balanced Performance Strategy
Optimal for: General Q&A, customer support, mixed content types- Good quality with efficiency balance
 - Slight response variation while maintaining consistency
 - Versatile processing for diverse use cases
 - Expected latency: 1-2 seconds
 - Context capacity: 128,000 tokens
 
High-Throughput Strategy
Optimal for: Real-time chat, high-volume processing, instant responses- Ultra-fast processing with Groq acceleration
 - High throughput capacity for concurrent requests
 - Real-time response generation for interactive applications
 - Expected latency: 0.5-1 second
 - Context capacity: 32,768 tokens
 
Creative Generation Strategy
Optimal for: Content creation, brainstorming, diverse outputs- Enhanced creativity with latest model capabilities
 - Diverse response generation for varied outputs
 - Advanced reasoning with creative flexibility
 - Expected latency: 2-5 seconds
 - Context capacity: 128,000 tokens
 
Resource-Efficient Strategy
Optimal for: Budget-conscious applications, simple Q&A, high-scale deployment- Optimized resource usage with minimal processing overhead
 - Fast response times with good quality retention
 - High scalability for large-scale deployments
 - Expected latency: 0.8-1.5 seconds
 - Context capacity: 128,000 tokens
 
Strategy Selection Matrix
| Use Case | Accuracy Priority | Speed Priority | Resource Efficiency | Recommended Strategy | 
|---|---|---|---|---|
| Technical Documentation | High | Medium | Medium | Maximum Accuracy | 
| Customer Support | Medium | High | Medium | High-Throughput | 
| General Q&A | Medium | Medium | High | Balanced Performance | 
| Content Creation | Medium | Low | Low | Creative Generation | 
| Real-time Chat | Low | Very High | High | High-Throughput | 
| Budget Applications | Medium | Medium | Very High | Resource-Efficient | 
Error Responses
Common Error Codes
| Status Code | Description | Example Response | 
|---|---|---|
| 400 | Bad Request - Invalid configuration | {"detail": "Invalid temperature value"} | 
| 401 | Unauthorized - Invalid or missing API token | {"detail": "Invalid authentication credentials"} | 
| 404 | Not Found - Flow or node not found | {"detail": "LLM node with id 'invalid-id' not found"} | 
| 422 | Unprocessable Entity - Validation error | {"detail": "Unknown model: invalid-model"} | 
| 500 | Internal Server Error - Server error | {"detail": "Failed to update LLM node"} | 
Error Response Format
Example Error Responses
Invalid Model
Invalid Temperature
Node Not Found
Invalid Prompt ID
Best Practices
Model Selection Guidelines
- Premium Quality: Use 
gpt-4oorgpt-4.1for complex reasoning and highest accuracy - Balanced Approach: Choose 
gpt-4o-miniorgpt-4.1-minifor versatile applications - Speed Optimization: Select 
mixtral-8x7b-32768orllama-3.1-8b-instantfor real-time processing - Resource Efficiency: Opt for 
gpt-4.1-nanoorgpt-3.5-turbo-0125for high-volume deployment 
Temperature Configuration
- Factual Content (0.0-0.1): Technical documentation, compliance, precise answers
 - Professional Responses (0.1-0.3): Customer support, structured explanations
 - Conversational (0.3-0.5): General Q&A, interactive applications
 - Creative Content (0.5-1.0): Content generation, brainstorming, diverse outputs
 - Experimental (1.0-2.0): Research, creative writing, novel approaches
 
Prompt Template Selection
- Default RAG: Use 
default_retrieval_promptfor general-purpose applications - Technical Focus: Select 
technical_documentation_assistantfor specialized content - Customer Support: Choose 
customer_support_agentfor service applications - Creative Content: Opt for 
creative_content_generatorfor diverse outputs 
Performance Optimization
- Context Management: Choose models with appropriate context windows for your content
 - Latency Requirements: Balance model quality with response time needs
 - Throughput Planning: Consider concurrent request patterns when selecting models
 - Resource Monitoring: Track processing patterns and adjust configurations accordingly
 
Troubleshooting
Node Not Found Error
Node Not Found Error
Solution: Verify that:
- The node ID is correct and exists in the specified flow
 - The node is indeed an LLM type node
 - You have access to the flow and node
 - The flow name in the URL matches exactly
 
Invalid Model Configuration
Invalid Model Configuration
Solution: If model configuration fails:
- Check that the model name is exactly as specified in available models
 - Verify that Groq models require 
GROQ_API_KEYenvironment variable - Ensure the model is supported in your deployment region
 - Confirm model availability hasn’t changed
 
Temperature Parameter Issues
Temperature Parameter Issues
Solution: For temperature configuration problems:
- Ensure temperature is between 0.0 and 2.0
 - Use appropriate precision (e.g., 0.2, not 0.2000001)
 - Consider that higher temperatures increase response variation
 - Test temperature effects with your specific use case
 
Prompt Template Errors
Prompt Template Errors
Solution: If prompt template assignment fails:
- Verify the prompt ID exists in your flow’s available prompts
 - Check that the prompt template is properly formatted
 - Ensure the prompt includes necessary placeholders (e.g., 
{context}) - Confirm prompt template compatibility with your use case
 
Processing Performance Issues
Processing Performance Issues
Solution: For slow or inconsistent processing:
- Consider switching to faster models for better latency
 - Adjust temperature to reduce processing complexity
 - Monitor context window usage and optimize input size
 - Check for concurrent request limits and throttling
 
Response Quality Problems
Response Quality Problems
Solution: If response quality is poor:
- Lower temperature for more consistent, factual responses
 - Switch to higher-quality models (gpt-4o, gpt-4.1)
 - Review and optimize prompt template instructions
 - Ensure context provided to LLM is relevant and well-formatted
 
Connection Issues
Connection Issues
Solution: For connectivity problems:
- Check your internet connection
 - Verify the flow URL is accessible
 - Ensure your firewall allows HTTPS traffic to *.flows.graphorlm.com
 - Try accessing the endpoint from a different network
 

