LLM nodes are the final component in RAG pipelines that generate natural language responses using retrieved context. They combine language models with prompt templates to create contextually-aware answers.

Key Features

  • Multiple Models: Access to 8 language models including GPT-4o, GPT-4.1, and high-speed Groq models
  • Prompt Integration: Custom prompt templates for specialized response behaviors
  • Temperature Control: Adjust creativity from deterministic (0.0) to creative (2.0) responses
  • Streaming Support: Real-time response generation for interactive applications
  • Quality Metrics: Built-in evaluation using DeepEval metrics

Available Endpoints

LLM Node Structure

{
  "id": "llm-1748287628685",
  "type": "llm",
  "position": { "x": 500, "y": 300 },
  "data": {
    "name": "Response Generator",
    "config": {
      "model": "gpt-4o",
      "promptId": "default_retrieval_prompt", 
      "temperature": 0.0
    },
    "result": {
      "total_responses": 1247,
      "avg_response_length": 342,
      "avg_processing_time": 2.8,
      "streaming_enabled": true
    }
  }
}

Configuration Parameters

ParameterTypeDescription
modelstringLanguage model for response generation
promptIdstringPrompt template for instruction guidance
temperaturefloat (0.0-2.0)Creativity and randomness control

Available Models

ModelContext WindowBest ForExpected Latency
gpt-4o128K tokensHigh accuracy, complex reasoning2-4 seconds
gpt-4o-mini128K tokensBalanced quality and speed1-2 seconds
gpt-4.1128K tokensLatest capabilities2-5 seconds
gpt-4.1-mini128K tokensModern features, efficient1-2 seconds
gpt-4.1-nano128K tokensResource optimization0.8-1.5 seconds
gpt-3.5-turbo-012516K tokensHigh-volume processing0.5-1 second
mixtral-8x7b-3276832K tokensReal-time processing0.5-1 second
llama-3.1-8b-instant8K tokensUltra-fast responses0.3-0.8 seconds

Common Configurations

Maximum Accuracy

For technical documentation and critical applications:
{
  "model": "gpt-4o",
  "promptId": "default_retrieval_prompt",
  "temperature": 0.0
}

Balanced Performance

For general Q&A and customer support:
{
  "model": "gpt-4o-mini",
  "promptId": "default_retrieval_prompt",
  "temperature": 0.2
}

High-Speed Processing

For real-time chat and instant responses:
{
  "model": "mixtral-8x7b-32768",
  "promptId": "customer_support_agent",
  "temperature": 0.1
}

Creative Generation

For content creation and diverse outputs:
{
  "model": "gpt-4.1",
  "promptId": "creative_content_generator",
  "temperature": 0.8
}

Quality Metrics

LLM nodes track response quality using DeepEval metrics:
  • Contextual Precision: Accuracy of context usage
  • Contextual Recall: Completeness of context utilization
  • Answer Relevancy: Relevance of response to question
  • Faithfulness: Adherence to context without hallucination

JavaScript Example

class LLMManager {
  constructor(flowName, apiToken) {
    this.flowName = flowName;
    this.apiToken = apiToken;
    this.baseUrl = `https://${flowName}.flows.graphorlm.com`;
  }
  
  async updateLLMNode(nodeId, config) {
    const response = await fetch(`${this.baseUrl}/llm/${nodeId}`, {
      method: 'PATCH',
      headers: {
        'Authorization': `Bearer ${this.apiToken}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ config })
    });
    
    if (!response.ok) {
      throw new Error(`Update failed: ${response.status}`);
    }
    
    return await response.json();
  }
  
  async getLLMNodes() {
    const response = await fetch(`${this.baseUrl}/llm`, {
      headers: { 'Authorization': `Bearer ${this.apiToken}` }
    });
    
    if (!response.ok) {
      throw new Error(`Failed to get nodes: ${response.status}`);
    }
    
    return await response.json();
  }
}

// Usage
const manager = new LLMManager('my-flow', 'YOUR_API_TOKEN');

// Update for high accuracy
await manager.updateLLMNode('llm-node-id', {
  model: 'gpt-4o',
  promptId: 'default_retrieval_prompt',
  temperature: 0.0
});

Python Example

import requests

class LLMManager:
    def __init__(self, flow_name, api_token):
        self.flow_name = flow_name
        self.api_token = api_token
        self.base_url = f"https://{flow_name}.flows.graphorlm.com"
    
    def update_llm_node(self, node_id, config):
        response = requests.patch(
            f"{self.base_url}/llm/{node_id}",
            headers={
                "Authorization": f"Bearer {self.api_token}",
                "Content-Type": "application/json"
            },
            json={"config": config}
        )
        response.raise_for_status()
        return response.json()
    
    def get_llm_nodes(self):
        response = requests.get(
            f"{self.base_url}/llm",
            headers={"Authorization": f"Bearer {self.api_token}"}
        )
        response.raise_for_status()
        return response.json()

# Usage
manager = LLMManager("my-flow", "YOUR_API_TOKEN")

# Update for fast processing
manager.update_llm_node("llm-node-id", {
    "model": "mixtral-8x7b-32768",
    "promptId": "customer_support_agent", 
    "temperature": 0.1
})

Best Practices

Model Selection

  • Use gpt-4o for maximum accuracy in critical applications
  • Use gpt-4o-mini for balanced performance in general use cases
  • Use Groq models (mixtral, llama) for real-time applications
  • Use creative models (gpt-4.1) for content generation

Temperature Settings

  • 0.0-0.1: Deterministic responses for factual queries
  • 0.2-0.4: Slight variation for natural conversations
  • 0.5-0.8: Creative responses for content generation
  • 0.9-2.0: Highly creative and diverse outputs

Performance Optimization

  • Monitor processing time and adjust model selection accordingly
  • Enable streaming for better user experience with longer responses
  • Use appropriate context window sizes for your content
  • Track quality metrics to ensure response accuracy

Troubleshooting

Common issues and solutions:
  • Slow responses: Switch to faster models (Groq) or lower temperature
  • Inconsistent quality: Use higher-quality models or lower temperature
  • Resource usage: Use efficient models (nano, mini) for high-volume processing
  • Poor context usage: Optimize prompt templates and verify context formatting

Next Steps