Your Customers Won't Read Your Help Center

Your customer messages "How do I reset my password?" on WhatsApp. You could send them a link to your help center—which they'll never visit. Or you could give them an instant, accurate answer pulled directly from your documentation. With sources cited. In seconds.

That's Retrieval-Augmented Generation (RAG). And it's easier to build than you think.

Today we're building a knowledge base bot that lives on WhatsApp using OpenClaw's agent framework and MoltFlow's WhatsApp API. Customers ask questions. The bot searches your docs, retrieves relevant sections, and crafts natural responses. No more "check our FAQ page" cop-outs. Real answers. Right now.

What Is RAG and Why Should You Care?

RAG stands for Retrieval-Augmented Generation. Here's the problem it solves: LLMs are great at generating text, but they don't know about your product. If you ask GPT-4 "How does MoltFlow's webhook retry logic work?", it'll hallucinate an answer because that information wasn't in its training data.

RAG fixes this:

Retrieval: Search your documentation for relevant chunks (using semantic similarity, not keyword matching)
Augmentation: Feed those chunks to the LLM as context
Generation: LLM crafts an answer based on actual docs, not guesses

The result? Accurate answers grounded in your real documentation. The LLM becomes a smart interface to your knowledge base, not a creative writer.

Why RAG beats pure LLM:

No hallucinations: Answers come from your docs, not the model's imagination
Always up-to-date: Update your docs, and answers change immediately—no retraining
Source attribution: You can show customers exactly which doc the answer came from

Why RAG beats keyword search:

Understands synonyms and paraphrasing ("reset password" matches "change your login credentials")
Handles complex questions ("What's the rate limit for API calls after 9pm EST?" requires understanding limits + timezones)
Returns natural language answers, not raw doc paragraphs

Architecture: From Question to Answer in Milliseconds

Here's the flow:

Customer sends question via WhatsApp → MoltFlow receives it
MoltFlow webhook fires → your server gets the message
Your server queries the knowledge base → semantic search finds relevant doc chunks
OpenClaw agent generates answer → using retrieved chunks as context
Response sent back via MoltFlow API → customer sees answer on WhatsApp

text

Customer Question → WhatsApp → MoltFlow → Your Server
                                              ↓
                                    [Vector Search DB]
                                              ↓
                                    [OpenClaw Agent + LLM]
                                              ↓
                              Answer ← MoltFlow ← Your Server

The key piece is the vector search database. It stores your docs as embeddings (high-dimensional vectors that encode semantic meaning). When a question comes in, you convert it to a vector and find the closest matches. Those matches become context for the LLM.

Let's build it.

Step 1: Prepare Your Knowledge Base

RAG needs content to retrieve from. Start with your existing documentation:

Product docs (markdown, HTML)
FAQ pages
API reference
Support articles
Tutorial videos (transcribe them first)

For this example, let's say you have markdown files in a docs/ folder:

text

docs/
├── getting-started.md
├── api-reference.md
├── webhooks.md
└── troubleshooting.md

Chunking strategy: Break documents into chunks of ~500 tokens. Too small, and you lose context. Too large, and irrelevant info bleeds in. 500 tokens is a sweet spot for most use cases.

Here's a simple chunker:

python

import tiktoken

def chunk_document(content: str, chunk_size: int = 500, overlap: int = 50):
    """Split document into overlapping chunks."""
    encoding = tiktoken.get_encoding("cl100k_base")
    tokens = encoding.encode(content)

    chunks = []
    for i in range(0, len(tokens), chunk_size - overlap):
        chunk_tokens = tokens[i:i + chunk_size]
        chunk_text = encoding.decode(chunk_tokens)
        chunks.append(chunk_text)

    return chunks

Overlap ensures that concepts split across chunks still appear together in at least one chunk.

Step 2: Create Embeddings with OpenClaw

Embeddings are vector representations of text. Sentences with similar meanings have similar vectors. OpenClaw integrates with popular embedding models out of the box.

Install dependencies:

bash

pip install openclaw-sdk chromadb

Now ingest your docs:

python

from openclaw import KnowledgeBase
from pathlib import Path

# Initialize knowledge base (uses ChromaDB by default)
kb = KnowledgeBase(
    name="product_docs",
    embedding_model="text-embedding-3-small",  # OpenAI's latest
    persist_directory="./data/kb"
)

# Load and chunk all markdown files
docs_dir = Path("./docs")
for doc_path in docs_dir.glob("*.md"):
    content = doc_path.read_text()
    chunks = chunk_document(content)

    # Add chunks to knowledge base with metadata
    for i, chunk in enumerate(chunks):
        kb.add(
            text=chunk,
            metadata={
                "source": doc_path.name,
                "chunk_index": i,
                "doc_type": "markdown"
            }
        )

print(f"Ingested {len(kb)} chunks")

This creates embeddings for every chunk and stores them in ChromaDB (a lightweight vector database). ChromaDB handles the heavy lifting—indexing, similarity search, persistence.

Production tip: Use Pinecone or Weaviate for scale. ChromaDB is great for prototyping but might struggle with millions of chunks.

Step 3: Build the RAG Agent

Now wire up the OpenClaw agent with RAG skills:

python

from openclaw import Agent, RAGSkill

# Define RAG-powered support agent
knowledge_agent = Agent(
    name="WhatsApp Knowledge Bot",
    system_prompt="""You are a helpful product expert for our platform.
    Answer customer questions using ONLY the provided documentation context.
    If the answer isn't in the context, say "I don't see that information in our docs.
    Let me connect you with a human who can help."

    Always cite the source document at the end of your response.
    Keep responses concise—WhatsApp users expect quick answers.
    Use plain text only (no markdown, no emojis).""",

    skills=[
        RAGSkill(
            knowledge_base=kb,
            retrieval_count=3,  # Retrieve top 3 most relevant chunks
            score_threshold=0.7  # Only use chunks with >70% similarity
        )
    ],

    model="gpt-4o",
    temperature=0.3  # Lower temp = more factual, less creative
)

Key parameters:

retrieval_count: How many chunks to retrieve. More chunks = more context, but also more noise and cost.
score_threshold: Filters out low-relevance chunks. If no chunks exceed the threshold, the agent admits it doesn't know.
temperature: Keep it low for factual Q&A. You want accurate answers, not creative writing.

Step 4: Connect to WhatsApp via MoltFlow

Wire up the webhook handler (same pattern as our customer support bot, but now using the RAG agent):

python

from fastapi import FastAPI, Request
import httpx

app = FastAPI()

MOLTFLOW_API_TOKEN = "your_token_here"
MOLTFLOW_API_BASE = "https://apiv2.waiflow.app/api/v2"

@app.post("/webhook/whatsapp")
async def handle_knowledge_query(request: Request):
    """Receives customer questions, searches knowledge base, returns answers."""

    payload = await request.json()
    question = payload["data"]["body"]
    sender_id = payload["data"]["from"]
    session_name = payload["session"]

    # Agent processes question using RAG skill
    response = await knowledge_agent.process(
        input_text=question,
        context={"sender_id": sender_id}
    )

    # Send answer back to customer
    async with httpx.AsyncClient() as client:
        await client.post(
            f"{MOLTFLOW_API_BASE}/sessions/{session_name}/messages",
            headers={"Authorization": f"Bearer {MOLTFLOW_API_TOKEN}"},
            json={
                "chatId": sender_id,
                "text": response.text
            }
        )

    return {"status": "ok"}

That's it. Customer asks a question on WhatsApp. Agent searches the knowledge base. Relevant chunks feed into the LLM. Answer appears on WhatsApp.

Step 5: Add Source Citations

Customers trust answers more when you show where they came from. Let's modify the agent to include source citations:

Update the system prompt:

python

system_prompt="""...
At the end of your response, add a line:
'Source: [document_name]'

If multiple documents were used, list them all.
"""

And extract metadata from the RAG results:

python

response = await knowledge_agent.process(
    input_text=question,
    context={"sender_id": sender_id}
)

# Get sources from retrieved chunks
sources = response.metadata.get("retrieved_chunks", [])
source_docs = set(chunk["source"] for chunk in sources)

# Append citation to response
cited_response = response.text
if source_docs:
    cited_response += f"\n\n📄 Source: {', '.join(source_docs)}"

# Send cited response
# ... (same as before)

Now answers look like:

text

To reset your password, go to Settings > Account > Change Password.
Enter your current password, then your new password twice.
You'll be logged out of all devices and need to log back in.

📄 Source: troubleshooting.md

Customers can verify the answer by checking the source. Trust goes up. Support ticket volume goes down.

Complete Python Example: End-to-End RAG Pipeline

Here's the full pipeline in one script:

python

from openclaw import Agent, KnowledgeBase, RAGSkill
from fastapi import FastAPI, Request
import httpx
from pathlib import Path

# 1. Ingest docs into knowledge base
kb = KnowledgeBase(name="product_docs", persist_directory="./data/kb")
for doc_path in Path("./docs").glob("*.md"):
    content = doc_path.read_text()
    chunks = chunk_document(content)
    for i, chunk in enumerate(chunks):
        kb.add(text=chunk, metadata={"source": doc_path.name, "chunk_index": i})

# 2. Create RAG agent
agent = Agent(
    name="Knowledge Bot",
    system_prompt="Answer using docs only. Cite sources. Plain text only.",
    skills=[RAGSkill(kb, retrieval_count=3, score_threshold=0.7)],
    model="gpt-4o",
    temperature=0.3
)

# 3. Set up FastAPI webhook handler
app = FastAPI()

@app.post("/webhook/whatsapp")
async def handle_message(request: Request):
    payload = await request.json()
    question = payload["data"]["body"]
    sender = payload["data"]["from"]
    session = payload["session"]

    # Process with RAG
    response = await agent.process(question, context={"sender_id": sender})

    # Extract and cite sources
    sources = response.metadata.get("retrieved_chunks", [])
    source_docs = set(c["source"] for c in sources)
    cited = response.text
    if source_docs:
        cited += f"\n\n📄 {', '.join(source_docs)}"

    # Reply via MoltFlow
    async with httpx.AsyncClient() as client:
        await client.post(
            f"https://apiv2.waiflow.app/api/v2/sessions/{session}/messages",
            headers={"Authorization": "Bearer YOUR_TOKEN"},
            json={"chatId": sender, "text": cited}
        )

    return {"ok": True}

Run it with:

bash

uvicorn main:app --reload

Your WhatsApp knowledge base is live.

Tips for Production RAG

1. Optimize Chunk Size

Experiment with chunk sizes. Technical docs might need larger chunks (800 tokens) to preserve code examples. FAQs work better with smaller chunks (300 tokens) because each Q&A is self-contained.

Test with real customer questions and see which chunk size produces the most accurate answers.

2. Handle "I Don't Know" Gracefully

Your agent will encounter questions your docs don't answer. That's fine. Train it to admit ignorance and escalate:

python

system_prompt="""...
If no retrieved context is relevant (score < 0.7), respond:
'I don't see that in our documentation. Let me connect you with our support team who can help.'
Then escalate.
"""

Hallucinating a wrong answer is worse than admitting you don't know.

3. Update the Knowledge Base Continuously

Docs change. New features ship. Old bugs get fixed. Your knowledge base needs to stay in sync.

Build a simple sync script that runs daily:

python

import os
from datetime import datetime, timedelta

def sync_docs():
    """Re-index docs modified in the last 24 hours."""
    kb = KnowledgeBase(name="product_docs")
    docs_dir = Path("./docs")

    cutoff = datetime.now() - timedelta(days=1)

    for doc_path in docs_dir.glob("*.md"):
        mod_time = datetime.fromtimestamp(doc_path.stat().st_mtime)
        if mod_time > cutoff:
            # Remove old chunks for this doc
            kb.delete(filter={"source": doc_path.name})

            # Re-ingest updated doc
            content = doc_path.read_text()
            chunks = chunk_document(content)
            for i, chunk in enumerate(chunks):
                kb.add(text=chunk, metadata={"source": doc_path.name, "chunk_index": i})

    print(f"Synced docs modified since {cutoff}")

Run this as a cron job. Customers always get answers from the latest docs.

4. Monitor Query Performance

Track which questions get answered and which don't. If the same question consistently gets an "I don't know" response, that's a signal to add that info to your docs.

Log failed queries:

python

if response.metadata.get("score") < 0.7:
    # Log to analytics
    print(f"Failed query: {question} (no relevant docs found)")

Over time, you'll identify gaps in your documentation and fill them.

What's Next?

You've built a RAG-powered knowledge base that:

Searches your documentation semantically
Answers customer questions with accurate, cited responses
Admits when it doesn't know and escalates gracefully
Lives on WhatsApp where your customers already are

MoltFlow's OpenClaw integration makes RAG-powered support bots production-ready in hours, not weeks.

Continue learning:

Build a WhatsApp Customer Support Bot with OpenClaw — Combine RAG with ticketing, escalation, and conversation memory
AI Lead Qualification on WhatsApp with OpenClaw — Use conversational AI to qualify and route leads
A2A Protocol: How AI Agents Communicate — Let your knowledge bot collaborate with other AI agents

Ready to implement a knowledge base bot? Follow our step-by-step guide: Build a WhatsApp Support Bot with OpenClaw