When building chatbots, one critical decision is whether and how to persist conversation history. I learned this the hard way after deploying three different chatbot implementations — each with a different persistence strategy. Here's what works, what doesn't, and why it matters more than you think.

The Problem: Stateless WebSockets Are a Trap

WebSockets feel magical during development. You open a connection, stream responses in real time, and the user experience is silky smooth. But the moment that connection drops — network blip, server restart, browser refresh — everything is gone.

For a simple "Ask Me Anything" chatbot about my background, losing conversation history is annoying but tolerable. For a data analytics agent executing SQL queries on behalf of users, it's a security and compliance nightmare.

Three Approaches I've Used

1. Ephemeral (No Persistence) — The Data Agent

My SQL analytics agent currently stores all state in-memory on the WebSocket consumer instance:

class DataAgentConsumer(WebsocketConsumer):
    def _reset(self):
        self.question = None
        self.steps = []  # Lost on disconnect
        self.pending_step = None

    def disconnect(self, close_code):
        pass  # Everything is gone

What's lost: Every SQL query proposed and executed, user approval decisions, execution results, error messages.

When it's acceptable: Prototypes, demos, internal tools with no audit requirements, stateless Q&A bots.

When it's dangerous: Any system accessing sensitive data, multi-step workflows, regulated industries (healthcare, finance), production systems where users expect history.

2. Audit Trail (Fire-and-Forget Logging) — WebSocket Chat

My WebSocket-based chatbot writes to PostgreSQL after the streaming response completes. The database is completely bypassed during real-time interaction:

# Stream first, persist later
for token in llm_stream:
    ws.send(token)  # Real-time to user

# Only after streaming completes:
WsChatMessage.objects.create(
    message=user_message,
    response=full_response,
    session_id=session_id
)

Benefits:

  • Zero perceived latency — database doesn't block the UX
  • Audit trail for debugging and analytics
  • Can reload conversation history later

Limitations:

  • Can't resume mid-conversation after disconnect
  • No intermediate state tracking (for multi-step agents)

Best for: Real-time streaming chatbots where the conversation is linear and resumption isn't critical.

3. State Buffer (Polling-Based Persistence) — Kafka Chat

My Kafka-based chatbot uses PostgreSQL as an intermediate state buffer between asynchronous message processing and synchronous HTTP polling:

# Write placeholder immediately
ChatMessage.objects.create(
    message_id=message_id,
    message=user_message,
    response="Processing..."  # Placeholder
)

# Kafka consumer updates when ready
msg = ChatMessage.objects.get(message_id=message_id)
msg.response = llm_answer
msg.answered_at = timezone.now()
msg.save()

# Frontend polls until answered_at is set

Benefits:

  • Full conversation history from the start
  • Can track processing state (pending → completed)
  • Survives server restarts (Kafka + DB provide durability)

Limitations:

  • Database becomes a bottleneck under heavy polling
  • 1-3 second latency from polling intervals

Best for: Event-driven architectures where decoupling is more important than latency, systems requiring guaranteed delivery.

What I Should Have Done: Hybrid Approach

After running all three in production, here's the pattern I recommend for multi-step conversational agents:

Log Critical Events, Not Everything

Event Type Persist? Why
User question ✅ Yes Audit, analytics
SQL query proposed ✅ Yes Security audit
User approval/rejection ✅ Yes Human-in-loop compliance
SQL execution results ⚠️ Metadata only Row count, columns, exec time — not full rows
Errors ✅ Yes Debugging, pattern detection
LLM thinking messages ❌ No Ephemeral UX, bloats DB
Chart data ❌ No Regenerate on demand from SQL

Schema Design

Two tables is enough for most use cases:

-- Session-level tracking
DataAgentSession
  - session_id (PK)
  - question (original user question)
  - started_at, ended_at
  - user_ip (for abuse detection)
  - total_steps

-- Step-level tracking
DataAgentStep
  - session_id (FK)
  - step_number
  - proposed_sql
  - status (proposed | approved | rejected | error)
  - execution_time_ms
  - row_count
  - error_message

This gives you full auditability without storing megabytes of result data.

When to Persist, When to Skip

Always Persist If:

  • You're executing database queries or API calls on behalf of users
  • The system handles sensitive data (PII, healthcare, financial)
  • You need to detect abuse or prompt injection attempts
  • Compliance requires an audit trail (SOC2, HIPAA, GDPR)
  • Multi-step workflows where users expect to resume

Ephemeral is OK If:

  • Simple Q&A with no side effects
  • Internal demo/prototype
  • No sensitive data access
  • Users don't expect conversation history

Performance Impact

I measured the overhead of adding persistence to the Data Agent:

Metric No Persistence With Audit Log
DB writes per session 0 ~5 (1 session + 4 steps avg)
Latency added 0ms ~5ms per step (non-blocking)
Storage per session 0 KB ~2 KB (metadata only)

The overhead is negligible compared to LLM inference time (1-3 seconds) and SQL execution (100-500ms).

Security Benefits

Persistence isn't just about user experience — it's critical for security:

Prompt Injection Detection

-- Find sessions with injection attempts
SELECT session_id, question, COUNT(*) as attempts
FROM data_agent_session
WHERE question ILIKE '%ignore previous%'
   OR question ILIKE '%you are now%'
   OR question ILIKE '%system prompt%'
GROUP BY session_id, question
ORDER BY attempts DESC;

SQL Injection Attempts

-- Detect malicious SQL patterns
SELECT session_id, proposed_sql
FROM data_agent_step
WHERE proposed_sql ILIKE '%DROP TABLE%'
   OR proposed_sql ILIKE '%DELETE FROM%'
   OR proposed_sql ILIKE '%auth_user%';

Abuse Detection

-- Find users running excessive queries
SELECT user_ip, COUNT(*) as sessions, SUM(total_steps) as total_steps
FROM data_agent_session
WHERE started_at > NOW() - INTERVAL '1 hour'
GROUP BY user_ip
HAVING COUNT(*) > 50;

The Mistake I Made

I launched the Data Agent without persistence because:

  1. It felt like over-engineering for a portfolio project
  2. The WebSocket implementation was "simple"
  3. I prioritized shipping over security

Then I realized: this agent executes user-provided SQL queries on a production database. Even with guardrails (whitelist tables, block DELETE/DROP, timeout), I have zero visibility into:

  • What queries are being attempted
  • Whether guardrails are working
  • If someone is probing for vulnerabilities

That's unacceptable for any production system.

Lessons Learned

1. Persist Before You Ship

Adding persistence after launch is 10× harder than building it in from day one. Users, database schema, and frontend expectations are already locked in.

2. Metadata > Full Data

You don't need to store every token the LLM produces. Store execution metadata (what happened, when, why) and regenerate details on demand.

3. Async Writes Are Free

Database writes after user actions (approve step, reject step) don't block the UX. The perceived latency is zero.

4. Indexes Matter

Index on session_id, started_at, and status. Your audit queries will thank you.

5. Retention Policy from Day One

Decide upfront: keep conversations for 30 days? 1 year? Forever? GDPR requires you delete data on request — design for that.

Implementation Checklist

If you're building a conversational agent, ask yourself:

  • ☐ Does this agent access sensitive data or execute actions?
  • ☐ Do I need an audit trail for compliance or security?
  • ☐ Should users be able to resume conversations?
  • ☐ Can I detect abuse without persistence?
  • ☐ What's the data retention policy?

If you answered "yes" to any of the first four, build persistence from the start.

Key Takeaway

Ephemeral WebSockets are fine for demos. For production chatbots — especially those with side effects — always persist critical events. The 5ms of overhead is worth the security, debuggability, and peace of mind.

Start with a simple audit log (session + steps). Don't over-engineer full conversation replay until you need it. And never, ever deploy a chatbot that executes SQL queries without logging what it does.

← Back to Articles