LLM Endpoints Overview

LLM nodes are the final component in RAG pipelines that generate natural language responses using retrieved context. They combine language models with prompt templates to create contextually-aware answers.

Key Features

Multiple Models: Access to 8 language models including GPT-4o, GPT-4.1, and high-speed Groq models
Prompt Integration: Custom prompt templates for specialized response behaviors
Temperature Control: Adjust creativity from deterministic (0.0) to creative (2.0) responses
Streaming Support: Real-time response generation for interactive applications
Quality Metrics: Built-in evaluation using DeepEval metrics

Available Endpoints

List LLM Nodes

GET /{flow_name}/llmRetrieve all LLM nodes with configurations and metrics

Update Configuration

PATCH /{flow_name}/llm/{node_id}Modify LLM node settings including model and temperature

List Prompts

GET /{flow_name}/promptsAccess available prompt templates for LLM customization

LLM Node Structure

{
  "id": "llm-1748287628685",
  "type": "llm",
  "position": { "x": 500, "y": 300 },
  "data": {
    "name": "Response Generator",
    "config": {
      "model": "gpt-4o",
      "promptId": "default_retrieval_prompt", 
      "temperature": 0.0
    },
    "result": {
      "total_responses": 1247,
      "avg_response_length": 342,
      "avg_processing_time": 2.8,
      "streaming_enabled": true
    }
  }
}

Configuration Parameters

Parameter	Type	Description
model	string	Language model for response generation
promptId	string	Prompt template for instruction guidance
temperature	float (0.0-2.0)	Creativity and randomness control

Available Models

Model	Context Window	Best For	Expected Latency
gpt-4o	128K tokens	High accuracy, complex reasoning	2-4 seconds
gpt-4o-mini	128K tokens	Balanced quality and speed	1-2 seconds
gpt-4.1	128K tokens	Latest capabilities	2-5 seconds
gpt-4.1-mini	128K tokens	Modern features, efficient	1-2 seconds
gpt-4.1-nano	128K tokens	Resource optimization	0.8-1.5 seconds
gpt-3.5-turbo-0125	16K tokens	High-volume processing	0.5-1 second
mixtral-8x7b-32768	32K tokens	Real-time processing	0.5-1 second
llama-3.1-8b-instant	8K tokens	Ultra-fast responses	0.3-0.8 seconds

Common Configurations

Maximum Accuracy

For technical documentation and critical applications:

{
  "model": "gpt-4o",
  "promptId": "default_retrieval_prompt",
  "temperature": 0.0
}

Balanced Performance

For general Q&A and customer support:

{
  "model": "gpt-4o-mini",
  "promptId": "default_retrieval_prompt",
  "temperature": 0.2
}

High-Speed Processing

For real-time chat and instant responses:

{
  "model": "mixtral-8x7b-32768",
  "promptId": "customer_support_agent",
  "temperature": 0.1
}

Creative Generation

For content creation and diverse outputs:

{
  "model": "gpt-4.1",
  "promptId": "creative_content_generator",
  "temperature": 0.8
}

Quality Metrics

LLM nodes track response quality using DeepEval metrics:

Contextual Precision: Accuracy of context usage
Contextual Recall: Completeness of context utilization
Answer Relevancy: Relevance of response to question
Faithfulness: Adherence to context without hallucination

JavaScript Example

class LLMManager {
  constructor(flowName, apiToken) {
    this.flowName = flowName;
    this.apiToken = apiToken;
    this.baseUrl = `https://${flowName}.flows.graphorlm.com`;
  }
  
  async updateLLMNode(nodeId, config) {
    const response = await fetch(`${this.baseUrl}/llm/${nodeId}`, {
      method: 'PATCH',
      headers: {
        'Authorization': `Bearer ${this.apiToken}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ config })
    });
    
    if (!response.ok) {
      throw new Error(`Update failed: ${response.status}`);
    }
    
    return await response.json();
  }
  
  async getLLMNodes() {
    const response = await fetch(`${this.baseUrl}/llm`, {
      headers: { 'Authorization': `Bearer ${this.apiToken}` }
    });
    
    if (!response.ok) {
      throw new Error(`Failed to get nodes: ${response.status}`);
    }
    
    return await response.json();
  }
}

// Usage
const manager = new LLMManager('my-flow', 'YOUR_API_TOKEN');

// Update for high accuracy
await manager.updateLLMNode('llm-node-id', {
  model: 'gpt-4o',
  promptId: 'default_retrieval_prompt',
  temperature: 0.0
});

Python Example

import requests

class LLMManager:
    def __init__(self, flow_name, api_token):
        self.flow_name = flow_name
        self.api_token = api_token
        self.base_url = f"https://{flow_name}.flows.graphorlm.com"
    
    def update_llm_node(self, node_id, config):
        response = requests.patch(
            f"{self.base_url}/llm/{node_id}",
            headers={
                "Authorization": f"Bearer {self.api_token}",
                "Content-Type": "application/json"
            },
            json={"config": config}
        )
        response.raise_for_status()
        return response.json()
    
    def get_llm_nodes(self):
        response = requests.get(
            f"{self.base_url}/llm",
            headers={"Authorization": f"Bearer {self.api_token}"}
        )
        response.raise_for_status()
        return response.json()

# Usage
manager = LLMManager("my-flow", "YOUR_API_TOKEN")

# Update for fast processing
manager.update_llm_node("llm-node-id", {
    "model": "mixtral-8x7b-32768",
    "promptId": "customer_support_agent", 
    "temperature": 0.1
})

Best Practices

Model Selection

Use gpt-4o for maximum accuracy in critical applications
Use gpt-4o-mini for balanced performance in general use cases
Use Groq models (mixtral, llama) for real-time applications
Use creative models (gpt-4.1) for content generation

Temperature Settings

0.0-0.1: Deterministic responses for factual queries
0.2-0.4: Slight variation for natural conversations
0.5-0.8: Creative responses for content generation
0.9-2.0: Highly creative and diverse outputs

Performance Optimization

Monitor processing time and adjust model selection accordingly
Enable streaming for better user experience with longer responses
Use appropriate context window sizes for your content
Track quality metrics to ensure response accuracy

Troubleshooting

Common issues and solutions:

Slow responses: Switch to faster models (Groq) or lower temperature
Inconsistent quality: Use higher-quality models or lower temperature
Resource usage: Use efficient models (nano, mini) for high-volume processing
Poor context usage: Optimize prompt templates and verify context formatting

Next Steps

List LLM Nodes

Explore your current LLM configurations and performance

Update Configuration

Optimize LLM settings for your use case

Manage Prompts

Customize prompt templates for better responses

Run Flows

Test your optimized LLM configurations

Get Started

Sources

Flows

Key Features

Available Endpoints

List LLM Nodes

Update Configuration

List Prompts

LLM Node Structure

Configuration Parameters

Available Models

Common Configurations

Maximum Accuracy

Balanced Performance

High-Speed Processing

Creative Generation

Quality Metrics

JavaScript Example

Python Example

Best Practices

Model Selection

Temperature Settings

Performance Optimization

Troubleshooting

Next Steps

List LLM Nodes

Update Configuration

Manage Prompts

Run Flows

Get Started

Sources

Flows

​Key Features

​Available Endpoints

List LLM Nodes

Update Configuration

List Prompts

​LLM Node Structure

​Configuration Parameters

​Available Models

​Common Configurations

​Maximum Accuracy

​Balanced Performance

​High-Speed Processing

​Creative Generation

​Quality Metrics

​JavaScript Example

​Python Example

​Best Practices

​Model Selection

​Temperature Settings

​Performance Optimization

​Troubleshooting

​Next Steps

List LLM Nodes

Update Configuration

Manage Prompts

Run Flows

Key Features

Available Endpoints

LLM Node Structure

Configuration Parameters

Available Models

Common Configurations

Maximum Accuracy

Balanced Performance

High-Speed Processing

Creative Generation

Quality Metrics

JavaScript Example

Python Example

Best Practices

Model Selection

Temperature Settings

Performance Optimization

Troubleshooting

Next Steps