I built three different chatbots on this site, each using a different architecture: Kafka + HTTP polling, WebSocket streaming, and WebSocket with direct SQL execution. Same Django backend, same LLMs, completely different data flows. Here's how they work and when to use each pattern.

The Three Architectures

Pattern 1: MarcelChatBot — Kafka + HTTP Polling

Location: Left panel on /about page

Message Flow:

1. Browser → HTTP POST /api/chat/
2. Django → PostgreSQL (write placeholder: "Processing...")
3. Django → Kafka topic: chatbot-questions
4. Kafka Consumer picks up message
5. Consumer → LLM (Gemini/Ollama)
6. Consumer → Kafka topic: chatbot-answers
7. Consumer → PostgreSQL (update response)
8. Browser polls /api/check-answer/ every 500ms-3s
9. Browser reads completed answer from PostgreSQL

Kafka Usage:

  • Two topics: chatbot-questions and chatbot-answers
  • Decouples web server from LLM processing
  • Enables multiple consumers (scaling), categorization pipeline, analytics
  • Messages persist on topics (can replay, audit)

PostgreSQL Usage:

Operation Table Purpose
Write on POST kf_chatmessage Create placeholder response
Update from Kafka kf_chatmessage Set answer, timestamp
Poll every 500ms-3s kf_chatmessage Check if answered_at is set

PostgreSQL Role: State buffer between async Kafka pipeline and sync HTTP client. Database acts as the "answer ready?" flag.

Characteristics:

  • ✅ Decoupled: Web server doesn't wait for LLM
  • ✅ Scalable: Add more Kafka consumers
  • ✅ Reliable: Messages persist, can survive restarts
  • ❌ Latency: 1-3 seconds from polling intervals
  • ❌ DB Load: Constant polling creates read pressure

Pattern 2: MarcelChatBot — WebSocket Streaming

Location: Right panel on /about page

Message Flow:

1. Browser opens WebSocket /ws/chat/
2. User sends message over WebSocket
3. Django Consumer receives message
4. Consumer → LLM (Gemini/Ollama)
5. LLM streams tokens back
6. Consumer sends each token over WebSocket
7. Browser renders tokens in real-time
8. After stream completes → PostgreSQL (audit log)

Kafka Usage:

  • None. Direct WebSocket connection, no message queue.

PostgreSQL Usage:

Operation Table Purpose
No writes during chat Database bypassed for real-time
Write after completion wbs_chatmessage Audit trail (fire-and-forget)

PostgreSQL Role: Audit log only. Not involved in real-time message delivery. Zero blocking.

Characteristics:

  • ✅ Low latency: <100ms to first token
  • ✅ Simple: ~80 lines of code, no infrastructure
  • ✅ Real-time streaming: User sees tokens as they arrive
  • ❌ Coupled: Web server blocked during LLM call
  • ❌ Stateful: Lost on disconnect (unless you add session recovery)
  • ❌ No fan-out: Can't send to multiple consumers

Pattern 3: Data Agent — WebSocket + Direct SQL

Location: /data-agent/ page

Message Flow:

1. Browser opens WebSocket /ws/data-agent/
2. User asks data question
3. Consumer → Gemini LLM (propose SQL)
4. Consumer sends SQL proposal to browser
5. User approves/rejects/skips
6. If approved → Consumer executes SQL on PostgreSQL
7. Consumer sends result rows to browser
8. Repeat steps 3-7 until final answer
9. No persistence (ephemeral session)

Kafka Usage:

  • None. Direct WebSocket, all state in-memory.

PostgreSQL Usage:

Operation Tables Purpose
Direct SQL execution analytics_users
analytics_purchases
analytics_product_specs
LLM-generated queries (user-approved)
No persistence Conversation not saved (ephemeral)

PostgreSQL Role: Live query engine. Each step executes a SELECT query generated by the LLM, results streamed to browser.

Safety Guardrails:

  • Only SELECT allowed (regex blocks DROP, DELETE, UPDATE)
  • Whitelist of 3 tables enforced
  • 10-second statement timeout
  • Auto-append LIMIT 100 if not specified
  • User approval required before execution

Characteristics:

  • ✅ Transparent: User sees every SQL query
  • ✅ Human-in-loop: Approve before execution
  • ✅ Interactive: Multi-step reasoning
  • ❌ DB load: Complex queries can slow/lock tables
  • ❌ No audit trail: Queries not logged (security risk)
  • ❌ Ephemeral: Lost on disconnect

Side-by-Side Comparison

Aspect Kafka Chat WebSocket Chat Data Agent
Transport HTTP POST + Polling WebSocket WebSocket
Kafka ✅ Yes (2 topics) ❌ No ❌ No
PostgreSQL Role State buffer Audit log Query engine
DB Writes per Q&A 2 (create + update) 2 (after stream) 0 (read-only)
DB Reads per Q&A 5-20 (polling) 0 (real-time) 1-7 (SQL steps)
Latency 1-3s (polling) <100ms <200ms + query time
Persistence Full (Kafka + DB) Audit only None (ephemeral)
Scalability High (add consumers) Medium (sticky sessions) Low (DB bottleneck)
Code Complexity ~330 lines ~110 lines ~700 lines

The PostgreSQL Pattern Difference

The most interesting insight: the same database plays three completely different roles:

1. State Buffer (Kafka Chat)

-- Placeholder while Kafka processes
INSERT INTO kf_chatmessage (message, response)
VALUES ('user question', 'Processing...');

-- Updated by Kafka consumer
UPDATE kf_chatmessage
SET response = 'LLM answer', answered_at = NOW()
WHERE message_id = '...';

-- Frontend polls until answered_at is set
SELECT response FROM kf_chatmessage
WHERE message_id = '...' AND answered_at IS NOT NULL;

Database is the synchronization point between async Kafka flow and sync HTTP polling. High read load from constant polling.

2. Audit Log (WebSocket Chat)

-- Only written AFTER stream completes
INSERT INTO wbs_chatmessage (message, response, session_id)
VALUES ('user question', 'full LLM answer', '...');

Database is completely bypassed during real-time interaction. Fire-and-forget logging. Zero blocking, minimal load.

3. Query Engine (Data Agent)

-- LLM-generated query, user-approved, executed live
SELECT product_name, COUNT(*) as purchases
FROM analytics_purchases
GROUP BY product_name
ORDER BY purchases DESC
LIMIT 5;

Database is the data source. Direct execution of LLM-generated SQL. High load from complex analytics queries. No conversation persistence.

When to Use Each Pattern

Use Kafka + HTTP Polling When:

  • You need to decouple web servers from LLM processing
  • Multiple downstream consumers need the same events (analytics, categorization, audit)
  • Message delivery guarantees are critical
  • Can tolerate 1-3 second latency
  • Want to replay or reprocess messages

Use WebSocket Streaming When:

  • Low latency (<500ms) is critical
  • Simple request-response pattern
  • No need for message fan-out
  • Moderate traffic (not 1000s concurrent)
  • Conversation history is optional

Use WebSocket + Direct SQL When:

  • Users need to see actual data rows, not summaries
  • Multi-step interactive exploration required
  • Transparency and user control are critical
  • You have read-only analytics tables (separate from production)
  • Can enforce strong SQL safety guardrails

The Mistake I Made

I deployed the Data Agent without persistence. It executes SQL queries on a production database, yet I have zero audit trail of:

  • What queries were attempted
  • Which were approved vs rejected
  • Whether safety guardrails are working
  • If someone is probing for vulnerabilities

For the WebSocket chat, ephemeral state is acceptable — it's just Q&A about my background. For SQL execution, it's a compliance and security risk.

Lesson: If your chatbot has side effects (database queries, API calls, file writes), always log what it does. The PostgreSQL overhead is negligible compared to the security benefits.

Key Takeaways

  1. Flow matters more than tech: Kafka isn't slow because it's Kafka — it's slow because polling introduces latency. WebSocket isn't fast because it's WebSocket — it's fast because there's no polling.
  2. PostgreSQL is flexible: Same database, three roles: state buffer, audit log, query engine. Design for the access pattern, not the database.
  3. Match architecture to constraints: If latency > reliability, use WebSocket. If reliability > latency, use Kafka. If transparency > everything, use Data Agent pattern.
  4. Persistence is a security feature: Ephemeral is fine for demos. For production systems with side effects, always log critical events.

Start simple (WebSocket), add complexity when constraints demand it (Kafka for fan-out, direct SQL for analytics), and always — always — log what your chatbot does.

← Back to Articles