Learnings from Claude Code: Building a Data Agent

After seeing how Claude Code's multi-step agent works — proposing actions, waiting for approval, executing, and iterating — I built a similar pattern into my own data analytics chatbot. Here are the design principles I extracted and how they translate to any chatbot project.

1. Step-by-Step Execution with Human Approval

Claude Code doesn't execute everything at once. It proposes a plan, shows you each step, and waits for you to approve before running it. This pattern is powerful for any chatbot that performs actions with side effects.

In my Data Agent, the LLM proposes a SQL query, the user sees it, and clicks Approve, Revise, Skip, or Exit. The key insight: never let the LLM execute without the user's consent. This builds trust and prevents costly mistakes.

2. Structured JSON Responses, Not Free Text

Instead of asking the LLM to respond in natural language and then parsing it, I require strict JSON output:

{
  "thinking": "why this step is needed",
  "next_step": {
    "title": "short title",
    "description": "what this does",
    "sql": "SELECT ..."
  }
}

This makes the agent deterministic. The frontend knows exactly where to find the SQL, the thinking, and the final answer. Parsing free text with regex is fragile; structured output is not.

3. Error Recovery Instead of Error Walls

When a SQL query fails (wrong table name, syntax error), the naive approach is to show the error and stop. The Claude Code approach: feed the error back to the LLM and ask it to correct itself.

"The previous SQL query failed with: relation 'analytics_products_purchased' does not exist.
Propose a corrected query. Remember the exact table names from the schema."

The LLM almost always self-corrects on the first retry. This one change eliminated most user-facing errors in my agent.

4. Defence-in-Depth Guardrails

Relying solely on the system prompt for safety is insufficient. I implemented guardrails at multiple layers:

Server-side input validation — Block prompt injection patterns ("ignore previous instructions", "you are now") and enforce a 500-character input limit before the question ever reaches the LLM.
Prompt-level rules — Instruct the model to refuse off-topic questions, block PII exposure, and stay within the analytics table scope.
SQL allowlist — Only SELECT on 3 specific tables. Enforced in code, not just in the prompt.
Output constraints — Auto-append LIMIT 20, cap results at 100 rows, 10-second query timeout.

The principle: the prompt is a suggestion, the code is the law. Any guardrail that only lives in the prompt can be bypassed with a clever injection.

5. Context Window Management

As the agent runs multiple steps, the conversation history grows. I pass previous steps and their results to the LLM so it can build on what it learned. But you must be selective — dump 100 rows of SQL results into the context and you'll blow through the token limit fast.

My approach: include step titles, SQL queries, and summarised results (first 5 rows + row count) in the context. The LLM gets enough to reason without drowning in data.

6. Offer Visualisations Proactively

When the LLM sees that the final result is a table with a categorical column and a numeric column, it proposes chart types (bar, line, pie, doughnut). The user clicks a button and Chart.js renders it instantly.

The key: the LLM doesn't generate chart code. It returns structured metadata (x_column, y_column, type) and the frontend handles rendering. This keeps the LLM output clean and the charts consistent.

7. Give the User an Escape Hatch

Not every step is useful. Sometimes the LLM goes down a wrong path. Providing Skip (move to the next step without executing) and Exit (stop the entire flow) gives the user control without forcing them to sit through irrelevant queries.

8. Test with Dirty Data

I intentionally injected 40 corrupted rows into the database: NULLs, duplicates, garbage strings, orphan foreign keys, and invalid timestamps. This exercises the agent's ability to surface data quality issues and handle unexpected values without crashing.

If your chatbot interacts with real-world data, test it with the messiest data you can find. The demo-quality clean dataset won't reveal the edge cases your users will hit immediately.

Summary

The patterns from Claude Code that made the biggest difference:

Step-by-step execution with human approval
Structured JSON responses
Auto error recovery (feed errors back to the LLM)
Guardrails in code, not just in prompts
Selective context window management
Proactive visualisation proposals
Always offer an escape hatch
Test with intentionally corrupted data

These aren't specific to SQL agents — they apply to any chatbot that does more than just answer questions.

Learnings from Claude Code That Apply to Any Chatbot