Hi! I'm the Data Agent. Ask me a question and I'll explore the database step by step. You approve each SQL query before it runs.

🧪 Corrupted Data — Testing Data Quality

The analytics_purchases table has been intentionally seeded with 40 corrupted rows to test the agent's ability to surface data quality issues. Try asking "Show me data quality of table purchases" to see them.

Injected anomalies

Category# RowsWhat's wrong
NULL values8Missing userID, product_id, product_name, or purchase_timestamp
Duplicate rows8Exact-duplicate purchases for the same user + product + timestamp
Garbage strings8Product names like !!INVALID!!, NULL_STRING, ???, empty strings
Orphan foreign keys8userID or product_id that don't exist in the parent tables (e.g. 99999, -1)
Format issues8Timestamps like not-a-date, 9999-01-01, epoch 0, and far-future dates

The other two tables — analytics_users and analytics_product_specs — remain clean, making cross-table joins useful for detecting orphan references.

🛡️ Guardrails — Prompt Injection & Security Controls

This agent applies a defence-in-depth approach with guardrails at both the LLM prompt level and the server-side code.

1. Off-topic refusal

The system prompt instructs the model to decline any question not related to the analytics data. If you ask about the weather, coding help, or general knowledge the agent responds with a polite refusal instead of generating SQL.

2. Sensitive data (PII) blocking

The model is forbidden from returning full_name and born_date together in the same result set. User-level queries must use userID, country, or gender only. Requests to extract or export personal data are refused.

3. Analytics-only table scope

Both the prompt and the server enforce a strict table allowlist:

  • analytics_users
  • analytics_purchases
  • analytics_product_specs

Any SQL referencing auth_user, django_session, chat message tables, or any other table is rejected server-side before execution.

4. Prompt injection rejection

Server-side filter — before the question ever reaches the LLM, it is scanned for 14 known injection patterns such as "ignore previous instructions", "you are now", "system prompt", "jailbreak", etc. Matches are blocked instantly with an error message.

Prompt-level rule — the system prompt tells the model to refuse any attempt to override instructions, change its role, or reveal its prompt.

5. Input & output size caps

  • Input: questions longer than 500 characters are rejected server-side.
  • Output: the model is instructed to keep final answers under 500 words.
  • SQL results: queries return at most 100 rows, and a LIMIT 20 is auto-appended unless the query already uses an aggregate or explicit limit.

6. SQL safety

  • Only SELECT statements are allowed. DROP, DELETE, UPDATE, INSERT, ALTER, TRUNCATE, CREATE, GRANT, REVOKE, and COPY are blocked.
  • A 10-second query timeout prevents runaway queries.
  • Every query is approved by you before execution.