#WhatsApp Knowledge Base with OpenClaw RAG
Your Customers Won't Read Your Help Center
Your customer messages "How do I reset my password?" on WhatsApp. You could send them a link to your help center—which they'll never visit. Or you could give them an instant, accurate answer pulled directly from your documentation. With sources cited. In seconds.
That's Retrieval-Augmented Generation (RAG). And it's easier to build than you think.
Today we're building a knowledge base bot that lives on WhatsApp using OpenClaw's agent framework and MoltFlow's WhatsApp API. Customers ask questions. The bot searches your docs, retrieves relevant sections, and crafts natural responses. No more "check our FAQ page" cop-outs. Real answers. Right now.
What Is RAG and Why Should You Care?
RAG stands for Retrieval-Augmented Generation. Here's the problem it solves: LLMs are great at generating text, but they don't know about your product. If you ask GPT-4 "How does MoltFlow's webhook retry logic work?", it'll hallucinate an answer because that information wasn't in its training data.
RAG fixes this:
- Retrieval: Search your documentation for relevant chunks (using semantic similarity, not keyword matching)
- Augmentation: Feed those chunks to the LLM as context
- Generation: LLM crafts an answer based on actual docs, not guesses
The result? Accurate answers grounded in your real documentation. The LLM becomes a smart interface to your knowledge base, not a creative writer.
Why RAG beats pure LLM:
- No hallucinations: Answers come from your docs, not the model's imagination
- Always up-to-date: Update your docs, and answers change immediately—no retraining
- Source attribution: You can show customers exactly which doc the answer came from
Why RAG beats keyword search:
- Understands synonyms and paraphrasing ("reset password" matches "change your login credentials")
- Handles complex questions ("What's the rate limit for API calls after 9pm EST?" requires understanding limits + timezones)
- Returns natural language answers, not raw doc paragraphs
Architecture: From Question to Answer in Milliseconds
Here's the flow:
- Customer sends question via WhatsApp → MoltFlow receives it
- MoltFlow webhook fires → your server gets the message
- Your server queries the knowledge base → semantic search finds relevant doc chunks
- OpenClaw agent generates answer → using retrieved chunks as context
- Response sent back via MoltFlow API → customer sees answer on WhatsApp
Customer Question → WhatsApp → MoltFlow → Your Server
↓
[Vector Search DB]
↓
[OpenClaw Agent + LLM]
↓
Answer ← MoltFlow ← Your ServerThe key piece is the vector search database. It stores your docs as embeddings (high-dimensional vectors that encode semantic meaning). When a question comes in, you convert it to a vector and find the closest matches. Those matches become context for the LLM.
Let's build it.
Step 1: Prepare Your Knowledge Base
RAG needs content to retrieve from. Start with your existing documentation:
- Product docs (markdown, HTML)
- FAQ pages
- API reference
- Support articles
- Tutorial videos (transcribe them first)
For this example, let's say you have markdown files in a docs/ folder:
docs/
├── getting-started.md
├── api-reference.md
├── webhooks.md
└── troubleshooting.mdChunking strategy: Break documents into chunks of ~500 tokens. Too small, and you lose context. Too large, and irrelevant info bleeds in. 500 tokens is a sweet spot for most use cases.
Here's a simple chunker:
import tiktoken
def chunk_document(content: str, chunk_size: int = 500, overlap: int = 50):
"""Split document into overlapping chunks."""
encoding = tiktoken.get_encoding("cl100k_base")
tokens = encoding.encode(content)
chunks = []
for i in range(0, len(tokens), chunk_size - overlap):
chunk_tokens = tokens[i:i + chunk_size]
chunk_text = encoding.decode(chunk_tokens)
chunks.append(chunk_text)
return chunksOverlap ensures that concepts split across chunks still appear together in at least one chunk.
Step 2: Create Embeddings with OpenClaw
Embeddings are vector representations of text. Sentences with similar meanings have similar vectors. OpenClaw integrates with popular embedding models out of the box.
Install dependencies:
pip install openclaw-sdk chromadbNow ingest your docs:
from openclaw import KnowledgeBase
from pathlib import Path
# Initialize knowledge base (uses ChromaDB by default)
kb = KnowledgeBase(
name="product_docs",
embedding_model="text-embedding-3-small", # OpenAI's latest
persist_directory="./data/kb"
)
# Load and chunk all markdown files
docs_dir = Path("./docs")
for doc_path in docs_dir.glob("*.md"):
content = doc_path.read_text()
chunks = chunk_document(content)
# Add chunks to knowledge base with metadata
for i, chunk in enumerate(chunks):
kb.add(
text=chunk,
metadata={
"source": doc_path.name,
"chunk_index": i,
"doc_type": "markdown"
}
)
print(f"Ingested {len(kb)} chunks")This creates embeddings for every chunk and stores them in ChromaDB (a lightweight vector database). ChromaDB handles the heavy lifting—indexing, similarity search, persistence.
Production tip: Use Pinecone or Weaviate for scale. ChromaDB is great for prototyping but might struggle with millions of chunks.
Step 3: Build the RAG Agent
Now wire up the OpenClaw agent with RAG skills:
from openclaw import Agent, RAGSkill
# Define RAG-powered support agent
knowledge_agent = Agent(
name="WhatsApp Knowledge Bot",
system_prompt="""You are a helpful product expert for our platform.
Answer customer questions using ONLY the provided documentation context.
If the answer isn't in the context, say "I don't see that information in our docs.
Let me connect you with a human who can help."
Always cite the source document at the end of your response.
Keep responses concise—WhatsApp users expect quick answers.
Use plain text only (no markdown, no emojis).""",
skills=[
RAGSkill(
knowledge_base=kb,
retrieval_count=3, # Retrieve top 3 most relevant chunks
score_threshold=0.7 # Only use chunks with >70% similarity
)
],
model="gpt-4o",
temperature=0.3 # Lower temp = more factual, less creative
)Key parameters:
- retrieval_count: How many chunks to retrieve. More chunks = more context, but also more noise and cost.
- score_threshold: Filters out low-relevance chunks. If no chunks exceed the threshold, the agent admits it doesn't know.
- temperature: Keep it low for factual Q&A. You want accurate answers, not creative writing.
Step 4: Connect to WhatsApp via MoltFlow
Wire up the webhook handler (same pattern as our customer support bot, but now using the RAG agent):
from fastapi import FastAPI, Request
import httpx
app = FastAPI()
MOLTFLOW_API_TOKEN = "your_token_here"
MOLTFLOW_API_BASE = "https://apiv2.waiflow.app/api/v2"
@app.post("/webhook/whatsapp")
async def handle_knowledge_query(request: Request):
"""Receives customer questions, searches knowledge base, returns answers."""
payload = await request.json()
question = payload["data"]["body"]
sender_id = payload["data"]["from"]
session_name = payload["session"]
# Agent processes question using RAG skill
response = await knowledge_agent.process(
input_text=question,
context={"sender_id": sender_id}
)
# Send answer back to customer
async with httpx.AsyncClient() as client:
await client.post(
f"{MOLTFLOW_API_BASE}/sessions/{session_name}/messages",
headers={"Authorization": f"Bearer {MOLTFLOW_API_TOKEN}"},
json={
"chatId": sender_id,
"text": response.text
}
)
return {"status": "ok"}That's it. Customer asks a question on WhatsApp. Agent searches the knowledge base. Relevant chunks feed into the LLM. Answer appears on WhatsApp.
Step 5: Add Source Citations
Customers trust answers more when you show where they came from. Let's modify the agent to include source citations:
Update the system prompt:
system_prompt="""...
At the end of your response, add a line:
'Source: [document_name]'
If multiple documents were used, list them all.
"""And extract metadata from the RAG results:
response = await knowledge_agent.process(
input_text=question,
context={"sender_id": sender_id}
)
# Get sources from retrieved chunks
sources = response.metadata.get("retrieved_chunks", [])
source_docs = set(chunk["source"] for chunk in sources)
# Append citation to response
cited_response = response.text
if source_docs:
cited_response += f"\n\n📄 Source: {', '.join(source_docs)}"
# Send cited response
# ... (same as before)Now answers look like:
To reset your password, go to Settings > Account > Change Password.
Enter your current password, then your new password twice.
You'll be logged out of all devices and need to log back in.
📄 Source: troubleshooting.mdCustomers can verify the answer by checking the source. Trust goes up. Support ticket volume goes down.
Complete Python Example: End-to-End RAG Pipeline
Here's the full pipeline in one script:
from openclaw import Agent, KnowledgeBase, RAGSkill
from fastapi import FastAPI, Request
import httpx
from pathlib import Path
# 1. Ingest docs into knowledge base
kb = KnowledgeBase(name="product_docs", persist_directory="./data/kb")
for doc_path in Path("./docs").glob("*.md"):
content = doc_path.read_text()
chunks = chunk_document(content)
for i, chunk in enumerate(chunks):
kb.add(text=chunk, metadata={"source": doc_path.name, "chunk_index": i})
# 2. Create RAG agent
agent = Agent(
name="Knowledge Bot",
system_prompt="Answer using docs only. Cite sources. Plain text only.",
skills=[RAGSkill(kb, retrieval_count=3, score_threshold=0.7)],
model="gpt-4o",
temperature=0.3
)
# 3. Set up FastAPI webhook handler
app = FastAPI()
@app.post("/webhook/whatsapp")
async def handle_message(request: Request):
payload = await request.json()
question = payload["data"]["body"]
sender = payload["data"]["from"]
session = payload["session"]
# Process with RAG
response = await agent.process(question, context={"sender_id": sender})
# Extract and cite sources
sources = response.metadata.get("retrieved_chunks", [])
source_docs = set(c["source"] for c in sources)
cited = response.text
if source_docs:
cited += f"\n\n📄 {', '.join(source_docs)}"
# Reply via MoltFlow
async with httpx.AsyncClient() as client:
await client.post(
f"https://apiv2.waiflow.app/api/v2/sessions/{session}/messages",
headers={"Authorization": "Bearer YOUR_TOKEN"},
json={"chatId": sender, "text": cited}
)
return {"ok": True}Run it with:
uvicorn main:app --reloadYour WhatsApp knowledge base is live.
Tips for Production RAG
1. Optimize Chunk Size
Experiment with chunk sizes. Technical docs might need larger chunks (800 tokens) to preserve code examples. FAQs work better with smaller chunks (300 tokens) because each Q&A is self-contained.
Test with real customer questions and see which chunk size produces the most accurate answers.
2. Handle "I Don't Know" Gracefully
Your agent will encounter questions your docs don't answer. That's fine. Train it to admit ignorance and escalate:
system_prompt="""...
If no retrieved context is relevant (score < 0.7), respond:
'I don't see that in our documentation. Let me connect you with our support team who can help.'
Then escalate.
"""Hallucinating a wrong answer is worse than admitting you don't know.
3. Update the Knowledge Base Continuously
Docs change. New features ship. Old bugs get fixed. Your knowledge base needs to stay in sync.
Build a simple sync script that runs daily:
import os
from datetime import datetime, timedelta
def sync_docs():
"""Re-index docs modified in the last 24 hours."""
kb = KnowledgeBase(name="product_docs")
docs_dir = Path("./docs")
cutoff = datetime.now() - timedelta(days=1)
for doc_path in docs_dir.glob("*.md"):
mod_time = datetime.fromtimestamp(doc_path.stat().st_mtime)
if mod_time > cutoff:
# Remove old chunks for this doc
kb.delete(filter={"source": doc_path.name})
# Re-ingest updated doc
content = doc_path.read_text()
chunks = chunk_document(content)
for i, chunk in enumerate(chunks):
kb.add(text=chunk, metadata={"source": doc_path.name, "chunk_index": i})
print(f"Synced docs modified since {cutoff}")Run this as a cron job. Customers always get answers from the latest docs.
4. Monitor Query Performance
Track which questions get answered and which don't. If the same question consistently gets an "I don't know" response, that's a signal to add that info to your docs.
Log failed queries:
if response.metadata.get("score") < 0.7:
# Log to analytics
print(f"Failed query: {question} (no relevant docs found)")Over time, you'll identify gaps in your documentation and fill them.
What's Next?
You've built a RAG-powered knowledge base that:
- Searches your documentation semantically
- Answers customer questions with accurate, cited responses
- Admits when it doesn't know and escalates gracefully
- Lives on WhatsApp where your customers already are
MoltFlow's OpenClaw integration makes RAG-powered support bots production-ready in hours, not weeks.
Continue learning:
- Build a WhatsApp Customer Support Bot with OpenClaw — Combine RAG with ticketing, escalation, and conversation memory
- AI Lead Qualification on WhatsApp with OpenClaw — Use conversational AI to qualify and route leads
- A2A Protocol: How AI Agents Communicate — Let your knowledge bot collaborate with other AI agents
Ready to implement a knowledge base bot? Follow our step-by-step guide: Build a WhatsApp Support Bot with OpenClaw
Sign up for MoltFlow and get your API credentials in under a minute. OpenClaw is open source—start building today.
> Try MoltFlow Free — 100 messages/month