Deterministic vs Black-Box Chatbots - Marcel Pellicero Esteban

Most chatbots today are black boxes: you ask a question, wait a few seconds, and get an answer. You have no idea how it got there. Was it a lucky guess? Did it hallucinate? Did it even look at the right data? I built two versions of the same data chatbot — one black-box, one deterministic — and the difference in trust, debugging, and usefulness was dramatic.

What Do We Mean by "Deterministic"?

A deterministic chatbot doesn't mean the LLM output is predictable token-by-token. It means the process is visible and verifiable. The user can see every step:

What the agent is thinking (its reasoning)
What SQL query it plans to run
The raw results from the database
How it interprets those results into an answer

At every step, the user can approve, revise, skip, or stop. The LLM is a reasoning engine, but the execution is auditable.

The Black-Box Approach

In a typical chatbot, the flow looks like this:

User: "What are the top 5 products by purchase count?"
Bot:  [thinking for 8 seconds...]
Bot:  "The top 5 products are Widget A (245), Gadget B (198)..."

The user gets an answer, but has no way to verify it. Did the bot query the right table? Did it count correctly? Did it filter out test data? If the answer looks wrong, the only recourse is to ask again and hope for better luck.

This is fine for casual conversation. It's dangerous for data analysis, where a wrong number can drive a wrong business decision.

The Deterministic Approach

The same question in a step-by-step agent looks like this:

User: "What are the top 5 products by purchase count?"

Agent thinking: "I need to count purchases grouped by product name."
Step 1: SELECT product_name, COUNT(*) AS total
        FROM analytics_purchases
        GROUP BY product_name
        ORDER BY total DESC
        LIMIT 5

[User clicks Approve]
[Query executes, results shown in a table]

Agent thinking: "I have the data. Summarising."
Final answer: "The top 5 products are..."
              [+ offers bar chart visualisation]

The user saw the query before it ran. They verified the table name, the aggregation logic, and the sort order. When results appeared, they could cross-check them. The answer is no longer a matter of faith — it's a matter of evidence.

Why This Matters More Than You Think

1. Trust

When a stakeholder asks "how did you get that number?", the deterministic chatbot has a receipt. Every step is logged: the SQL, the raw data, the reasoning. The black-box chatbot has nothing but "the AI said so."

In regulated industries — finance, healthcare, compliance — "the AI said so" is not an acceptable answer. Auditability isn't a nice-to-have; it's a requirement.

2. Debugging

When a black-box chatbot gives a wrong answer, you're blind. You don't know if the problem was:

The LLM hallucinated a table name
The query logic was wrong (counted rows instead of distinct values)
The data itself was bad (nulls, duplicates, orphan keys)
The LLM misinterpreted the results

With a deterministic agent, the bug is immediately visible. If the SQL references analytics_products_purchased instead of analytics_purchases, you see it in Step 1 before it even runs. If the count is wrong, the raw results table tells you why.

3. User Control

Not every query the LLM proposes is useful. Sometimes it goes down a rabbit hole — joining tables unnecessarily, adding filters the user didn't ask for, or running 7 queries when 2 would suffice.

The Skip button lets users say "that's not useful, move on." The Revise button lets them redirect. The Exit button lets them stop entirely. None of these are possible with a black box that runs everything internally.

4. Education

An underrated benefit: non-technical stakeholders learn SQL by watching the agent work. They start recognising patterns — GROUP BY for aggregation, JOIN for combining tables, COUNT(DISTINCT ...) for unique values. The chatbot becomes a teaching tool, not just an answering machine.

The Cost of Transparency

The deterministic approach isn't free. It's slower — the user must approve each step instead of getting an instant answer. It requires more engineering: step management, approval UI, error recovery, state tracking between steps.

And it exposes the LLM's mistakes. When the agent proposes a bad query, the user sees it. With a black box, the LLM might silently retry and the user never knows. Transparency means vulnerability.

But this is a feature, not a bug. If your chatbot can't survive its mistakes being visible, the problem isn't transparency — it's the chatbot.

When to Use Which

Scenario	Approach	Why
Customer support FAQ	Black box	Speed matters, stakes are low, answers are retrievable
Data analysis / BI	Deterministic	Numbers must be verifiable, wrong answers have consequences
Code generation	Deterministic	User must review before execution (Claude Code, Copilot)
Creative writing	Black box	Surprise is the point, not a risk
Compliance / audit	Deterministic	Every action must be logged and reviewable
Internal tooling	Hybrid	Auto-run safe queries, require approval for mutations

Implementation Patterns

If you're building a deterministic chatbot, here are the patterns that worked for me:

Structured JSON output — force the LLM to return JSON with explicit fields (thinking, sql, final_answer) instead of free text. This makes every decision parseable and loggable.
Approval gates — never execute a query without user consent. Show the SQL, explain what it does, and wait for Approve/Revise/Skip/Exit.
Error recovery — when a query fails, feed the error back to the LLM and ask it to self-correct. Show both the error and the corrected query so the user sees the full story.
Step history in context — pass previous steps and results to the LLM so it builds on verified data, not assumptions.
Guardrails in code, not just prompts — validate SQL server-side (SELECT-only, table allowlist, auto LIMIT). The prompt is advice; the code is the law.

The Bottom Line

A black-box chatbot is a colleague who says "trust me." A deterministic chatbot is a colleague who shows their work on a whiteboard. Both can be right. But when they're wrong, only one lets you figure out why.

For anything involving data, decisions, or actions — show the work.

Deterministic vs Black-Box Chatbots: Why Showing the Work Matters