When building chatbots, one critical decision is whether and how to persist conversation history. I learned this the hard way after deploying three different chatbot implementations — each with a different persistence strategy. Here's what works, what doesn't, and why it matters more than you think.
The Problem: Stateless WebSockets Are a Trap
WebSockets feel magical during development. You open a connection, stream responses in real time, and the user experience is silky smooth. But the moment that connection drops — network blip, server restart, browser refresh — everything is gone.
For a simple "Ask Me Anything" chatbot about my background, losing conversation history is annoying but tolerable. For a data analytics agent executing SQL queries on behalf of users, it's a security and compliance nightmare.
Three Approaches I've Used
1. Ephemeral (No Persistence) — The Data Agent
My SQL analytics agent currently stores all state in-memory on the WebSocket consumer instance:
class DataAgentConsumer(WebsocketConsumer):
def _reset(self):
self.question = None
self.steps = [] # Lost on disconnect
self.pending_step = None
def disconnect(self, close_code):
pass # Everything is gone
What's lost: Every SQL query proposed and executed, user approval decisions, execution results, error messages.
When it's acceptable: Prototypes, demos, internal tools with no audit requirements, stateless Q&A bots.
When it's dangerous: Any system accessing sensitive data, multi-step workflows, regulated industries (healthcare, finance), production systems where users expect history.
2. Audit Trail (Fire-and-Forget Logging) — WebSocket Chat
My WebSocket-based chatbot writes to PostgreSQL after the streaming response completes. The database is completely bypassed during real-time interaction:
# Stream first, persist later
for token in llm_stream:
ws.send(token) # Real-time to user
# Only after streaming completes:
WsChatMessage.objects.create(
message=user_message,
response=full_response,
session_id=session_id
)
Benefits:
- Zero perceived latency — database doesn't block the UX
- Audit trail for debugging and analytics
- Can reload conversation history later
Limitations:
- Can't resume mid-conversation after disconnect
- No intermediate state tracking (for multi-step agents)
Best for: Real-time streaming chatbots where the conversation is linear and resumption isn't critical.
3. State Buffer (Polling-Based Persistence) — Kafka Chat
My Kafka-based chatbot uses PostgreSQL as an intermediate state buffer between asynchronous message processing and synchronous HTTP polling:
# Write placeholder immediately
ChatMessage.objects.create(
message_id=message_id,
message=user_message,
response="Processing..." # Placeholder
)
# Kafka consumer updates when ready
msg = ChatMessage.objects.get(message_id=message_id)
msg.response = llm_answer
msg.answered_at = timezone.now()
msg.save()
# Frontend polls until answered_at is set
Benefits:
- Full conversation history from the start
- Can track processing state (pending → completed)
- Survives server restarts (Kafka + DB provide durability)
Limitations:
- Database becomes a bottleneck under heavy polling
- 1-3 second latency from polling intervals
Best for: Event-driven architectures where decoupling is more important than latency, systems requiring guaranteed delivery.
What I Should Have Done: Hybrid Approach
After running all three in production, here's the pattern I recommend for multi-step conversational agents:
Log Critical Events, Not Everything
| Event Type | Persist? | Why |
|---|---|---|
| User question | ✅ Yes | Audit, analytics |
| SQL query proposed | ✅ Yes | Security audit |
| User approval/rejection | ✅ Yes | Human-in-loop compliance |
| SQL execution results | ⚠️ Metadata only | Row count, columns, exec time — not full rows |
| Errors | ✅ Yes | Debugging, pattern detection |
| LLM thinking messages | ❌ No | Ephemeral UX, bloats DB |
| Chart data | ❌ No | Regenerate on demand from SQL |
Schema Design
Two tables is enough for most use cases:
-- Session-level tracking
DataAgentSession
- session_id (PK)
- question (original user question)
- started_at, ended_at
- user_ip (for abuse detection)
- total_steps
-- Step-level tracking
DataAgentStep
- session_id (FK)
- step_number
- proposed_sql
- status (proposed | approved | rejected | error)
- execution_time_ms
- row_count
- error_message
This gives you full auditability without storing megabytes of result data.
When to Persist, When to Skip
Always Persist If:
- You're executing database queries or API calls on behalf of users
- The system handles sensitive data (PII, healthcare, financial)
- You need to detect abuse or prompt injection attempts
- Compliance requires an audit trail (SOC2, HIPAA, GDPR)
- Multi-step workflows where users expect to resume
Ephemeral is OK If:
- Simple Q&A with no side effects
- Internal demo/prototype
- No sensitive data access
- Users don't expect conversation history
Performance Impact
I measured the overhead of adding persistence to the Data Agent:
| Metric | No Persistence | With Audit Log |
|---|---|---|
| DB writes per session | 0 | ~5 (1 session + 4 steps avg) |
| Latency added | 0ms | ~5ms per step (non-blocking) |
| Storage per session | 0 KB | ~2 KB (metadata only) |
The overhead is negligible compared to LLM inference time (1-3 seconds) and SQL execution (100-500ms).
Security Benefits
Persistence isn't just about user experience — it's critical for security:
Prompt Injection Detection
-- Find sessions with injection attempts
SELECT session_id, question, COUNT(*) as attempts
FROM data_agent_session
WHERE question ILIKE '%ignore previous%'
OR question ILIKE '%you are now%'
OR question ILIKE '%system prompt%'
GROUP BY session_id, question
ORDER BY attempts DESC;
SQL Injection Attempts
-- Detect malicious SQL patterns
SELECT session_id, proposed_sql
FROM data_agent_step
WHERE proposed_sql ILIKE '%DROP TABLE%'
OR proposed_sql ILIKE '%DELETE FROM%'
OR proposed_sql ILIKE '%auth_user%';
Abuse Detection
-- Find users running excessive queries
SELECT user_ip, COUNT(*) as sessions, SUM(total_steps) as total_steps
FROM data_agent_session
WHERE started_at > NOW() - INTERVAL '1 hour'
GROUP BY user_ip
HAVING COUNT(*) > 50;
The Mistake I Made
I launched the Data Agent without persistence because:
- It felt like over-engineering for a portfolio project
- The WebSocket implementation was "simple"
- I prioritized shipping over security
Then I realized: this agent executes user-provided SQL queries on a production database. Even with guardrails (whitelist tables, block DELETE/DROP, timeout), I have zero visibility into:
- What queries are being attempted
- Whether guardrails are working
- If someone is probing for vulnerabilities
That's unacceptable for any production system.
Lessons Learned
1. Persist Before You Ship
Adding persistence after launch is 10× harder than building it in from day one. Users, database schema, and frontend expectations are already locked in.
2. Metadata > Full Data
You don't need to store every token the LLM produces. Store execution metadata (what happened, when, why) and regenerate details on demand.
3. Async Writes Are Free
Database writes after user actions (approve step, reject step) don't block the UX. The perceived latency is zero.
4. Indexes Matter
Index on session_id, started_at, and status. Your audit queries will thank you.
5. Retention Policy from Day One
Decide upfront: keep conversations for 30 days? 1 year? Forever? GDPR requires you delete data on request — design for that.
Implementation Checklist
If you're building a conversational agent, ask yourself:
- ☐ Does this agent access sensitive data or execute actions?
- ☐ Do I need an audit trail for compliance or security?
- ☐ Should users be able to resume conversations?
- ☐ Can I detect abuse without persistence?
- ☐ What's the data retention policy?
If you answered "yes" to any of the first four, build persistence from the start.
Key Takeaway
Ephemeral WebSockets are fine for demos. For production chatbots — especially those with side effects — always persist critical events. The 5ms of overhead is worth the security, debuggability, and peace of mind.
Start with a simple audit log (session + steps). Don't over-engineer full conversation replay until you need it. And never, ever deploy a chatbot that executes SQL queries without logging what it does.