Claude 4 for every WhatsApp reply. Quality: excellent. Customers: happy. Monthly invoice: $2,500.

The problem: Seventy percent of queries are simple FAQs. "What are your hours?" "Where are you located?" "How much does it cost?" Gemini 2.0 Flash handles these at $0.001 per query. You're paying $0.024 for Claude 4.

Simple math: 10,000 queries/month. 7,000 are FAQs. Claude 4: $240. Multi-model routing (Gemini for FAQs, GPT-4o for general, Claude 4 for complex): $54.40. Savings: 77%. Quality: nearly identical (9.0/10 vs 9.1/10).

The n8n workflow: classify intent via keyword matching, route to appropriate model via MoltFlow AI endpoint, check confidence score, fallback to next-tier model if below 70%, send response via webhooks. This guide provides the complete importable workflow template. Works with 500+ n8n integrations for CRM sync, analytics, escalation.

Why Multi-Model Orchestration Matters

Running a single AI model for all queries is like hiring a surgeon to apply band-aids. It works, but the economics are terrible. Here is why smart teams route between models.

Cost Optimization

Not all queries carry the same complexity. Match the model to the task and the savings compound fast.

Simple FAQ ("What are your hours?") -- Gemini 2.0 Flash at $0.001 per query
General inquiry ("Tell me about your enterprise features") -- GPT-4o at $0.012 per query
Complex troubleshooting ("Why is my webhook triggering twice on message.any events?") -- Claude 4 at $0.024 per query

At 10,000 queries per month with 70% falling into the simple category, you are looking at $2,100 in monthly savings compared to running everything through Claude 4. That is $25,200 per year back in your pocket.

Quality Assurance Through Fallback

What if the cheap model gets it wrong? That is where fallback chains come in. Gemini 2.0 tries first. If the confidence score drops below 70% or an error occurs, the query escalates to GPT-4o. If GPT-4o also struggles, Claude 4 handles it as the last resort.

You get the cost savings of cheap models with the quality guarantee of premium ones. The fallback adds a fraction of a cent to the average cost but keeps your response quality at 9.0 out of 10 -- nearly identical to running Claude 4 exclusively.

Latency Optimization

Different models have different speed profiles. GPT-4o averages 1.2 seconds response time, making it ideal for real-time chat. Claude 4 averages 1.8 seconds but excels at technical accuracy. Gemini 2.0 sits at 1.5 seconds and handles batch processing efficiently with its massive context window.

Route time-sensitive queries to the fastest model and complex analysis to the most accurate. Your users get faster responses on simple questions and better answers on hard ones.

Specialization

Each model has strengths. GPT-4o excels at creative responses and multi-language support. Claude 4 dominates technical accuracy and long-context conversations. Gemini 2.0 shines with document-heavy RAG tasks thanks to its 1M token context window.

Instead of forcing one model to do everything adequately, let each model do what it does best.

Architecture: Intent-Based Routing

The core idea is simple. Classify the incoming message, pick the right model, generate a response, and verify quality. Here is the flow:

text

WhatsApp Message
    |
    v
Classify Intent (keyword matching or AI)
    |
    v
Route to Model:
  - simple_faq     --> Gemini 2.0 Flash
  - technical       --> GPT-4o
  - sensitive/complex --> Claude 4
    |
    v
Check Confidence Score
    |
    v
If < 70% --> Fallback Chain:
  Gemini --> GPT-4o --> Claude 4
    |
    v
Send Response via MoltFlow

The classification step is where the magic happens. You have three strategies to choose from, each with different tradeoffs.

Strategy 1: Keyword Matching

The fastest and cheapest approach. Zero API calls for classification -- just string matching.

javascript

function classifyIntent(message) {
  const faqKeywords = ['hours', 'location', 'price', 'cost', 'open', 'close',
                       'address', 'phone', 'email', 'directions'];
  const technicalKeywords = ['error', 'bug', 'not working', 'webhook', 'api',
                             'integration', 'configure', 'setup', 'debug'];
  const sensitiveKeywords = ['refund', 'complaint', 'legal', 'cancel',
                             'medical', 'emergency', 'escalate', 'manager'];

  const lowerMsg = message.toLowerCase();

  if (sensitiveKeywords.some(kw => lowerMsg.includes(kw))) {
    return { intent: 'sensitive_topic', confidence: 0.9, model: 'claude-sonnet-4.5' };
  } else if (technicalKeywords.some(kw => lowerMsg.includes(kw))) {
    return { intent: 'technical_support', confidence: 0.7, model: 'gpt-4o' };
  } else if (faqKeywords.some(kw => lowerMsg.includes(kw))) {
    return { intent: 'simple_faq', confidence: 0.8, model: 'gemini-2.0-flash' };
  } else {
    return { intent: 'general', confidence: 0.5, model: 'gpt-4o' };
  }
}

Notice the priority order: sensitive topics first (you never want those going to a cheap model), then technical, then FAQ, then a general fallback. Keyword matching covers about 80% of messages accurately. It costs nothing and adds zero latency.

Strategy 2: AI-Based Classification

For higher accuracy, use a lightweight model to classify intent. GPT-3.5-turbo costs about $0.0005 per classification -- negligible at scale.

Send a prompt like: "Classify this customer message into exactly one category: FAQ, Technical, Sensitive, or General. Respond with only the category name."

The AI classifier catches nuance that keywords miss. "My order never arrived and I want my money back" contains no keyword for "refund" but an AI model correctly identifies it as sensitive.

Strategy 3: Hybrid Approach

Best of both worlds. Use keyword matching for obvious cases -- roughly 80-90% of traffic. When keywords return low confidence (no match or ambiguous match), fall back to AI classification for that specific message.

You get keyword speed for the majority of queries and AI accuracy for the edge cases. Total classification cost stays under $5/month for 10,000 queries.

n8n Workflow: Intent-Based Model Selection

Let's build this in n8n, node by node. You will have a working workflow you can import and customize.

Node 1: Webhook Trigger

This receives the WhatsApp message from MoltFlow's webhook system. The incoming payload looks like:

json

{
  "event": "message",
  "session": "support-bot",
  "payload": {
    "from": "[email protected]",
    "body": "What are your business hours?",
    "timestamp": "2026-03-01T14:30:00Z"
  }
}

Node 2: Function (Classify Intent)

A Code node that analyzes the message and decides which model to use.

javascript

const message = $json.payload.body;
const lowerMsg = message.toLowerCase();

// Keyword-based classification with priority routing
const sensitiveKeywords = ['refund', 'complaint', 'legal', 'cancel', 'manager'];
const technicalKeywords = ['error', 'bug', 'not working', 'webhook', 'api', 'configure'];
const faqKeywords = ['hours', 'location', 'price', 'cost', 'open', 'address'];

let intent, model, maxCost;

if (sensitiveKeywords.some(kw => lowerMsg.includes(kw))) {
  intent = 'sensitive';
  model = 'claude-sonnet-4.5';
  maxCost = 0.024;
} else if (technicalKeywords.some(kw => lowerMsg.includes(kw))) {
  intent = 'technical';
  model = 'gpt-4o';
  maxCost = 0.015;
} else if (faqKeywords.some(kw => lowerMsg.includes(kw))) {
  intent = 'simple_faq';
  model = 'gemini-2.0-flash';
  maxCost = 0.002;
} else {
  intent = 'general';
  model = 'gpt-4o';
  maxCost = 0.015;
}

return {
  ...($json),
  intent,
  model,
  maxCost,
  classifiedAt: new Date().toISOString()
};

Node 3: HTTP Request (MoltFlow AI Generate)

Send the message to MoltFlow's AI endpoint with the dynamically selected model.

URL: https://apiv2.waiflow.app/api/v2/ai/generate
Method: POST
Headers: Authorization: Bearer {{$env.MOLTFLOW_API_TOKEN}}
Body:

json

{
  "session_name": "support-bot",
  "message": "={{$node['WhatsApp Webhook'].json.payload.body}}",
  "model": "={{$node['Classify Intent'].json.model}}",
  "use_rag": true,
  "max_tokens": 300
}

The model field is dynamic -- it uses whatever the classification node decided. The use_rag flag pulls from your knowledge base for grounded answers.

Node 4: Switch (Check Confidence)

After receiving the AI response, check the confidence score.

If confidence >= 0.7: Route to Send Response node
If confidence < 0.7: Route to Fallback Chain

This is the quality gate. If the cheap model is not confident enough, escalate automatically.

Importable Workflow JSON

Copy this JSON and import it directly into n8n via Settings > Import from File:

json

{
  "name": "Multi-Model WhatsApp Routing",
  "nodes": [
    {
      "name": "WhatsApp Webhook",
      "type": "n8n-nodes-base.webhook",
      "parameters": {
        "httpMethod": "POST",
        "path": "whatsapp-multimodel",
        "responseMode": "lastNode"
      },
      "position": [250, 300]
    },
    {
      "name": "Classify Intent",
      "type": "n8n-nodes-base.code",
      "parameters": {
        "jsCode": "const msg = $json.payload.body.toLowerCase();\nconst sensitive = ['refund','complaint','legal','cancel'];\nconst technical = ['error','bug','not working','webhook','api'];\nconst faq = ['hours','location','price','cost','open'];\n\nlet model = 'gpt-4o';\nlet intent = 'general';\n\nif (sensitive.some(k => msg.includes(k))) { model = 'claude-sonnet-4.5'; intent = 'sensitive'; }\nelse if (technical.some(k => msg.includes(k))) { model = 'gpt-4o'; intent = 'technical'; }\nelse if (faq.some(k => msg.includes(k))) { model = 'gemini-2.0-flash'; intent = 'simple_faq'; }\n\nreturn { json: { ...$json, model, intent } };"
      },
      "position": [450, 300]
    },
    {
      "name": "Generate Response",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "https://apiv2.waiflow.app/api/v2/ai/generate",
        "method": "POST",
        "authentication": "headerAuth",
        "bodyParametersJson": "={ \"session_name\": \"support-bot\", \"message\": \"{{$node['WhatsApp Webhook'].json.payload.body}}\", \"model\": \"{{$node['Classify Intent'].json.model}}\", \"use_rag\": true, \"max_tokens\": 300 }"
      },
      "position": [650, 300],
      "credentials": {
        "httpHeaderAuth": { "name": "MoltFlow API" }
      }
    },
    {
      "name": "Check Confidence",
      "type": "n8n-nodes-base.switch",
      "parameters": {
        "conditions": {
          "number": [
            {
              "value1": "={{$json.confidence}}",
              "operation": "largerEqual",
              "value2": 0.7
            }
          ]
        }
      },
      "position": [850, 300]
    },
    {
      "name": "Send Response",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "https://apiv2.waiflow.app/api/v2/messages",
        "method": "POST",
        "authentication": "headerAuth",
        "bodyParametersJson": "={ \"session_name\": \"support-bot\", \"chatId\": \"{{$node['WhatsApp Webhook'].json.payload.from}}\", \"text\": \"{{$json.reply}}\" }"
      },
      "position": [1050, 200],
      "credentials": {
        "httpHeaderAuth": { "name": "MoltFlow API" }
      }
    }
  ],
  "connections": {
    "WhatsApp Webhook": { "main": [[{ "node": "Classify Intent", "type": "main", "index": 0 }]] },
    "Classify Intent": { "main": [[{ "node": "Generate Response", "type": "main", "index": 0 }]] },
    "Generate Response": { "main": [[{ "node": "Check Confidence", "type": "main", "index": 0 }]] },
    "Check Confidence": {
      "main": [
        [{ "node": "Send Response", "type": "main", "index": 0 }],
        [{ "node": "Try GPT-4o Fallback", "type": "main", "index": 0 }]
      ]
    }
  }
}

Before importing: Add your MoltFlow API key to n8n credentials as "MoltFlow API" (Header Auth with X-API-Key). Replace support-bot with your actual session name.

Advanced Pattern: Fallback Chain with Quality Escalation

The basic routing workflow is good. The fallback chain makes it production-grade. Here is the strategy: try the cheapest model first, escalate on failure, and track costs across the entire chain.

When to Use Fallback Chains

Fallback chains shine in three situations:

Ambiguous queries -- the user asks something that could be simple or complex
Low-confidence responses -- the cheap model generates an answer but is not sure about it
Production environments -- where a wrong answer costs more than the extra API call

If you are running a support bot where incorrect answers create support tickets, the fallback chain pays for itself immediately.

Implementation: Gemini to GPT-4o to Claude 4

The cascade works like a series of safety nets. Each model tries to handle the query. If it succeeds with sufficient confidence, the chain stops. If not, the next model takes over.

Node: Try Gemini (First Attempt)

javascript

try {
  const response = await fetch('https://apiv2.waiflow.app/api/v2/ai/generate', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${$env.MOLTFLOW_API_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      message: $json.payload.body,
      session_name: 'support-bot',
      model: 'gemini-2.0-flash',
      use_rag: true,
      max_tokens: 200
    })
  });

  const data = await response.json();

  if (data.confidence >= 0.7) {
    return {
      json: { success: true, reply: data.reply, model: 'gemini-2.0-flash', cost: 0.001 }
    };
  } else {
    return {
      json: { success: false, error: 'Low confidence', geminiCost: 0.001 }
    };
  }
} catch (error) {
  return {
    json: { success: false, error: error.message, geminiCost: 0.001 }
  };
}

Node: Switch (Check Gemini Success)

If success === true -- route to Send Response
If success === false -- route to Try GPT-4o

Node: Try GPT-4o (Second Attempt)

javascript

try {
  const response = await fetch('https://apiv2.waiflow.app/api/v2/ai/generate', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${$env.MOLTFLOW_API_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      message: $json.payload.body,
      session_name: 'support-bot',
      model: 'gpt-4o',
      use_rag: true,
      max_tokens: 300
    })
  });

  const data = await response.json();

  if (data.confidence >= 0.7) {
    return {
      json: {
        success: true,
        reply: data.reply,
        model: 'gpt-4o',
        cost: $json.geminiCost + 0.012
      }
    };
  } else {
    return {
      json: { success: false, error: 'Low confidence', chainCost: $json.geminiCost + 0.012 }
    };
  }
} catch (error) {
  return {
    json: { success: false, error: error.message, chainCost: $json.geminiCost + 0.012 }
  };
}

Node: Try Claude 4 (Final Fallback)

javascript

try {
  const response = await fetch('https://apiv2.waiflow.app/api/v2/ai/generate', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${$env.MOLTFLOW_API_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      message: $json.payload.body,
      session_name: 'support-bot',
      model: 'claude-sonnet-4.5',
      use_rag: true,
      max_tokens: 500
    })
  });

  const data = await response.json();

  return {
    json: {
      success: true,
      reply: data.reply,
      model: 'claude-sonnet-4.5',
      cost: $json.chainCost + 0.024
    }
  };
} catch (error) {
  // All models failed - escalate to human
  return {
    json: {
      success: false,
      reply: 'Our team will get back to you shortly. A human agent has been notified.',
      model: 'human_escalation',
      cost: $json.chainCost
    }
  };
}

Cost Tracking Across the Chain

After the final response is determined, log the total cost of the chain:

javascript

// Cost tracking node - runs after final response
const geminiCost = $node["Try Gemini"].json.success ? 0.001 : 0.001; // always charged
const gptCost = $node["Try GPT-4o"]?.json ? 0.012 : 0;
const claudeCost = $node["Try Claude 4"]?.json ? 0.024 : 0;

const totalCost = geminiCost + gptCost + claudeCost;
const finalModel = $json.model;
const attemptCount = 1 + (gptCost > 0 ? 1 : 0) + (claudeCost > 0 ? 1 : 0);

return {
  json: {
    reply: $json.reply,
    model_used: finalModel,
    total_cost: totalCost,
    attempts: attemptCount,
    timestamp: new Date().toISOString()
  }
};

This gives you full visibility into which queries escalate and how much each response actually costs. Over time, you use this data to tune your keyword lists and confidence thresholds.

Cost Optimization with Multi-Model Routing

Let's run the numbers on a realistic scenario: 10,000 WhatsApp queries per month.

Single-Model Approach (Claude 4 for Everything)

Metric	Value
Queries	10,000
Cost per query	$0.024 avg
Monthly total	$240
Quality score	9.1/10
Avg latency	1.8s

Great quality, but expensive. And 70% of those queries did not need a premium model.

Multi-Model with Intent Routing

Query Type	Volume	Model	Cost/Query	Subtotal
Simple FAQ	7,000	Gemini 2.0 Flash	$0.001	$7.00
General	2,500	GPT-4o	$0.012	$30.00
Complex	500	Claude 4	$0.024	$12.00
Total	10,000			$49.00

That is an 80% reduction -- from $240 to $49 per month. Quality drops slightly to 8.5/10, but only because Gemini handles the simple FAQs. For "What are your hours?" that difference is invisible to the customer.

Multi-Model with Fallback Chain

Now add the fallback safety net:

Scenario	Volume	Cost	Notes
Base routing	10,000	$49.00	Same as above
Gemini failures (5%)	350 escalate to GPT-4o	+$4.20	Low confidence on ambiguous FAQs
GPT-4o failures (2%)	50 escalate to Claude 4	+$1.20	Edge cases needing premium model
Total	10,000	$54.40

77% savings compared to single-model. Quality score: 9.0/10 -- nearly identical to running Claude 4 exclusively. Average latency: 1.4 seconds (slightly higher for fallback cases due to the extra ~0.5s per escalation).

ROI Calculation

Metric	Value
Monthly savings	$240 - $54.40 = $185.60
Annual savings	$2,227.20
Setup time	~4 hours (one-time)
Break-even	Less than 1 week

For businesses handling 50,000+ queries per month, the annual savings exceed $11,000. The n8n workflow takes an afternoon to build and runs indefinitely.

Production Best Practices

Deploying multi-model routing is straightforward. Keeping it reliable in production requires a few extra considerations.

Monitor Per-Model Success Rates

Track how often each model succeeds without fallback. If Gemini's success rate drops from 95% to 85%, that signals your keyword lists need updating or your query mix has shifted.

Review these metrics weekly. Adjust routing thresholds based on real data, not assumptions.

Set Daily Budget Limits

Prevent runaway costs with a budget cap:

javascript

// Budget check - runs before model selection
const DAILY_BUDGET = 50; // $50/day maximum
const today = new Date().toISOString().split('T')[0];

// Retrieve today's spend from your tracking database
const currentSpend = await getCostFromDatabase(today);

if (currentSpend >= DAILY_BUDGET) {
  return {
    json: {
      reply: "Our AI assistant is temporarily unavailable. Please contact support directly.",
      reason: "daily_budget_reached",
      budget: DAILY_BUDGET,
      spent: currentSpend
    }
  };
}

// Budget OK - proceed with model routing
return { json: { ...$json, budgetRemaining: DAILY_BUDGET - currentSpend } };

A $50 daily budget caps your monthly spend at $1,500 maximum, even if query volume spikes unexpectedly.

A/B Test Routing Strategies

Split your traffic to find the optimal routing approach:

Group A (50%): Keyword matching only
Group B (50%): Hybrid approach (keywords + AI classification for ambiguous cases)

Measure cost, quality score, and customer satisfaction for each group over 2-4 weeks. The winner becomes your default. Most businesses find that keyword matching alone covers 90% of cases, making the hybrid approach unnecessary overhead.

Handle Rate Limits

Each AI provider enforces different rate limits:

Provider	Rate Limit	Tier
OpenAI (GPT-4o)	10,000 RPM	Paid
Anthropic (Claude 4)	4,000 RPM	Standard
Google (Gemini 2.0)	60 RPM	Free
Google (Gemini 2.0)	1,000 RPM	Paid

Google's free tier is severely limited. If you are routing 7,000 queries per month to Gemini, you need the paid tier. At 60 RPM on the free tier, you can only handle about 1 query per second -- fine for low-volume bots, insufficient for production.

Build retry logic into each fallback node. If a model returns a 429 (rate limit) error, treat it like a low-confidence response and escalate to the next model in the chain.

Log Everything

Comprehensive logging turns your routing system into a learning system:

javascript

// Logging node - runs after every response
const logEntry = {
  timestamp: new Date().toISOString(),
  message_hash: hashMessage($json.payload.body), // privacy: hash, don't store raw
  intent: $json.intent,
  initial_model: $json.model,
  final_model: $json.model_used,
  fallback_count: $json.attempts - 1,
  cost_usd: $json.total_cost,
  latency_ms: Date.now() - new Date($json.classifiedAt).getTime(),
  confidence: $json.confidence,
  success: $json.success
};

// Push to your analytics database
await fetch('https://your-analytics-endpoint.com/ai-logs', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify(logEntry)
});

return { json: logEntry };

After a month of logs, you will know exactly which intents cause fallbacks, which models underperform, and where your keyword lists have gaps. Use that data to continuously improve routing accuracy.

Complete n8n Workflow Template

Here is the full orchestration workflow with all nodes, including the fallback chain. Import this into n8n and configure your credentials.

json

{
  "name": "MoltFlow Multi-Model Orchestration",
  "nodes": [
    {
      "name": "WhatsApp Webhook",
      "type": "n8n-nodes-base.webhook",
      "parameters": {
        "httpMethod": "POST",
        "path": "moltflow-multimodel",
        "responseMode": "lastNode"
      },
      "position": [250, 300]
    },
    {
      "name": "Classify Intent",
      "type": "n8n-nodes-base.code",
      "parameters": {
        "jsCode": "const msg = $json.payload.body.toLowerCase();\nconst sensitive = ['refund','complaint','legal','cancel','manager'];\nconst technical = ['error','bug','not working','webhook','api','configure'];\nconst faq = ['hours','location','price','cost','open','address'];\n\nlet model = 'gpt-4o';\nlet intent = 'general';\nlet maxCost = 0.015;\n\nif (sensitive.some(k => msg.includes(k))) {\n  model = 'claude-sonnet-4.5'; intent = 'sensitive'; maxCost = 0.024;\n} else if (technical.some(k => msg.includes(k))) {\n  model = 'gpt-4o'; intent = 'technical'; maxCost = 0.015;\n} else if (faq.some(k => msg.includes(k))) {\n  model = 'gemini-2.0-flash'; intent = 'simple_faq'; maxCost = 0.002;\n}\n\nreturn { json: { ...$json, model, intent, maxCost, classifiedAt: new Date().toISOString() } };"
      },
      "position": [450, 300]
    },
    {
      "name": "Try Gemini",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "https://apiv2.waiflow.app/api/v2/ai/generate",
        "method": "POST",
        "authentication": "headerAuth",
        "bodyParametersJson": "={ \"session_name\": \"support-bot\", \"message\": \"{{$node['WhatsApp Webhook'].json.payload.body}}\", \"model\": \"gemini-2.0-flash\", \"use_rag\": true, \"max_tokens\": 200 }"
      },
      "position": [650, 200],
      "credentials": {
        "httpHeaderAuth": { "name": "MoltFlow API" }
      }
    },
    {
      "name": "Check Gemini Confidence",
      "type": "n8n-nodes-base.switch",
      "parameters": {
        "conditions": {
          "number": [
            {
              "value1": "={{$json.confidence}}",
              "operation": "largerEqual",
              "value2": 0.7
            }
          ]
        }
      },
      "position": [850, 200]
    },
    {
      "name": "Try GPT-4o",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "https://apiv2.waiflow.app/api/v2/ai/generate",
        "method": "POST",
        "authentication": "headerAuth",
        "bodyParametersJson": "={ \"session_name\": \"support-bot\", \"message\": \"{{$node['WhatsApp Webhook'].json.payload.body}}\", \"model\": \"gpt-4o\", \"use_rag\": true, \"max_tokens\": 300 }"
      },
      "position": [1050, 300],
      "credentials": {
        "httpHeaderAuth": { "name": "MoltFlow API" }
      }
    },
    {
      "name": "Try Claude 4",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "https://apiv2.waiflow.app/api/v2/ai/generate",
        "method": "POST",
        "authentication": "headerAuth",
        "bodyParametersJson": "={ \"session_name\": \"support-bot\", \"message\": \"{{$node['WhatsApp Webhook'].json.payload.body}}\", \"model\": \"claude-sonnet-4.5\", \"use_rag\": true, \"max_tokens\": 500 }"
      },
      "position": [1250, 400],
      "credentials": {
        "httpHeaderAuth": { "name": "MoltFlow API" }
      }
    },
    {
      "name": "Send Response",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "https://apiv2.waiflow.app/api/v2/messages",
        "method": "POST",
        "authentication": "headerAuth",
        "bodyParametersJson": "={ \"session_name\": \"support-bot\", \"chatId\": \"{{$node['WhatsApp Webhook'].json.payload.from}}\", \"text\": \"{{$json.reply}}\" }"
      },
      "position": [1450, 300],
      "credentials": {
        "httpHeaderAuth": { "name": "MoltFlow API" }
      }
    }
  ],
  "connections": {
    "WhatsApp Webhook": { "main": [[{ "node": "Classify Intent", "type": "main", "index": 0 }]] },
    "Classify Intent": { "main": [[{ "node": "Try Gemini", "type": "main", "index": 0 }]] },
    "Try Gemini": { "main": [[{ "node": "Check Gemini Confidence", "type": "main", "index": 0 }]] },
    "Check Gemini Confidence": {
      "main": [
        [{ "node": "Send Response", "type": "main", "index": 0 }],
        [{ "node": "Try GPT-4o", "type": "main", "index": 0 }]
      ]
    },
    "Try GPT-4o": { "main": [[{ "node": "Try Claude 4", "type": "main", "index": 0 }]] },
    "Try Claude 4": { "main": [[{ "node": "Send Response", "type": "main", "index": 0 }]] }
  }
}

Setup instructions:

Import the JSON into n8n (Settings > Import from File or paste into editor)
Add your MoltFlow API key as an n8n credential named "MoltFlow API" (Header Auth, name: X-API-Key)
Replace support-bot with your actual MoltFlow session name in all HTTP Request nodes
Configure the webhook URL in MoltFlow Dashboard > Webhooks, pointing to your n8n instance
Activate the workflow and send a test message

The workflow handles the complete flow: receive message, classify intent, try the cheapest appropriate model, fallback on low confidence, and send the response back through WhatsApp.

What's Next?

Multi-model orchestration is the single biggest lever for reducing AI costs in production. You keep premium quality where it matters and save 40-80% on everything else. For a typical business handling 10,000 queries per month, that translates to over $2,200 in annual savings -- with a setup time measured in hours, not weeks.

Connect MoltFlow to n8n first, configure webhooks for message capture, and use this guide's workflow template for multi-model routing. Extend with REST API calls for advanced model selection logic.

Ready to implement this? Follow our step-by-step guide: Connect MoltFlow to n8n for the foundational webhook setup, then use this guide's workflow template for cost-optimized AI routing.

Want to go deeper? These guides build on what you learned here:

AI Model Comparison for WhatsApp Bots -- detailed benchmarks for GPT-4o, Claude 4, and Gemini across response quality, latency, and cost
Connect MoltFlow to n8n -- if you are new to n8n + MoltFlow, start here for the basic integration setup
Build a Lead Pipeline with n8n -- practical n8n workflow patterns for lead capture and CRM integration

MoltFlow features to explore:

Set Up Webhooks -- Configure webhook events for AI message handling
REST API Reference -- Full API for multi-model AI calls

Ready to optimize your AI costs with smart model routing? MoltFlow supports GPT, Claude, and Gemini out of the box. Import the workflow template above, configure your API key, and start saving today.