After seeing how Claude Code's multi-step agent works — proposing actions, waiting for approval, executing, and iterating — I built a similar pattern into my own data analytics chatbot. Here are the design principles I extracted and how they translate to any chatbot project.
1. Step-by-Step Execution with Human Approval
Claude Code doesn't execute everything at once. It proposes a plan, shows you each step, and waits for you to approve before running it. This pattern is powerful for any chatbot that performs actions with side effects.
In my Data Agent, the LLM proposes a SQL query, the user sees it, and clicks Approve, Revise, Skip, or Exit. The key insight: never let the LLM execute without the user's consent. This builds trust and prevents costly mistakes.
2. Structured JSON Responses, Not Free Text
Instead of asking the LLM to respond in natural language and then parsing it, I require strict JSON output:
{
"thinking": "why this step is needed",
"next_step": {
"title": "short title",
"description": "what this does",
"sql": "SELECT ..."
}
}
This makes the agent deterministic. The frontend knows exactly where to find the SQL, the thinking, and the final answer. Parsing free text with regex is fragile; structured output is not.
3. Error Recovery Instead of Error Walls
When a SQL query fails (wrong table name, syntax error), the naive approach is to show the error and stop. The Claude Code approach: feed the error back to the LLM and ask it to correct itself.
"The previous SQL query failed with: relation 'analytics_products_purchased' does not exist.
Propose a corrected query. Remember the exact table names from the schema."
The LLM almost always self-corrects on the first retry. This one change eliminated most user-facing errors in my agent.
4. Defence-in-Depth Guardrails
Relying solely on the system prompt for safety is insufficient. I implemented guardrails at multiple layers:
- Server-side input validation — Block prompt injection patterns ("ignore previous instructions", "you are now") and enforce a 500-character input limit before the question ever reaches the LLM.
- Prompt-level rules — Instruct the model to refuse off-topic questions, block PII exposure, and stay within the analytics table scope.
- SQL allowlist — Only
SELECTon 3 specific tables. Enforced in code, not just in the prompt. - Output constraints — Auto-append
LIMIT 20, cap results at 100 rows, 10-second query timeout.
The principle: the prompt is a suggestion, the code is the law. Any guardrail that only lives in the prompt can be bypassed with a clever injection.
5. Context Window Management
As the agent runs multiple steps, the conversation history grows. I pass previous steps and their results to the LLM so it can build on what it learned. But you must be selective — dump 100 rows of SQL results into the context and you'll blow through the token limit fast.
My approach: include step titles, SQL queries, and summarised results (first 5 rows + row count) in the context. The LLM gets enough to reason without drowning in data.
6. Offer Visualisations Proactively
When the LLM sees that the final result is a table with a categorical column and a numeric column, it proposes chart types (bar, line, pie, doughnut). The user clicks a button and Chart.js renders it instantly.
The key: the LLM doesn't generate chart code. It returns structured metadata (x_column, y_column, type) and the frontend handles rendering. This keeps the LLM output clean and the charts consistent.
7. Give the User an Escape Hatch
Not every step is useful. Sometimes the LLM goes down a wrong path. Providing Skip (move to the next step without executing) and Exit (stop the entire flow) gives the user control without forcing them to sit through irrelevant queries.
8. Test with Dirty Data
I intentionally injected 40 corrupted rows into the database: NULLs, duplicates, garbage strings, orphan foreign keys, and invalid timestamps. This exercises the agent's ability to surface data quality issues and handle unexpected values without crashing.
If your chatbot interacts with real-world data, test it with the messiest data you can find. The demo-quality clean dataset won't reveal the edge cases your users will hit immediately.
Summary
The patterns from Claude Code that made the biggest difference:
- Step-by-step execution with human approval
- Structured JSON responses
- Auto error recovery (feed errors back to the LLM)
- Guardrails in code, not just in prompts
- Selective context window management
- Proactive visualisation proposals
- Always offer an escape hatch
- Test with intentionally corrupted data
These aren't specific to SQL agents — they apply to any chatbot that does more than just answer questions.