Data Analytics Agent
Multi-step SQL agent — shows reasoning, proposes queries, you approve each step.
🧪 Corrupted Data — Testing Data Quality
The analytics_purchases table has been intentionally seeded with 40 corrupted rows to test the agent's ability to surface data quality issues. Try asking "Show me data quality of table purchases" to see them.
Injected anomalies
| Category | # Rows | What's wrong |
|---|---|---|
| NULL values | 8 | Missing userID, product_id, product_name, or purchase_timestamp |
| Duplicate rows | 8 | Exact-duplicate purchases for the same user + product + timestamp |
| Garbage strings | 8 | Product names like !!INVALID!!, NULL_STRING, ???, empty strings |
| Orphan foreign keys | 8 | userID or product_id that don't exist in the parent tables (e.g. 99999, -1) |
| Format issues | 8 | Timestamps like not-a-date, 9999-01-01, epoch 0, and far-future dates |
The other two tables — analytics_users and analytics_product_specs — remain clean, making cross-table joins useful for detecting orphan references.
🛡️ Guardrails — Prompt Injection & Security Controls
This agent applies a defence-in-depth approach with guardrails at both the LLM prompt level and the server-side code.
1. Off-topic refusal
The system prompt instructs the model to decline any question not related to the analytics data. If you ask about the weather, coding help, or general knowledge the agent responds with a polite refusal instead of generating SQL.
2. Sensitive data (PII) blocking
The model is forbidden from returning full_name and born_date together in the same result set. User-level queries must use userID, country, or gender only. Requests to extract or export personal data are refused.
3. Analytics-only table scope
Both the prompt and the server enforce a strict table allowlist:
analytics_usersanalytics_purchasesanalytics_product_specs
Any SQL referencing auth_user, django_session, chat message tables, or any other table is rejected server-side before execution.
4. Prompt injection rejection
Server-side filter — before the question ever reaches the LLM, it is scanned for 14 known injection patterns such as "ignore previous instructions", "you are now", "system prompt", "jailbreak", etc. Matches are blocked instantly with an error message.
Prompt-level rule — the system prompt tells the model to refuse any attempt to override instructions, change its role, or reveal its prompt.
5. Input & output size caps
- Input: questions longer than 500 characters are rejected server-side.
- Output: the model is instructed to keep final answers under 500 words.
- SQL results: queries return at most 100 rows, and a
LIMIT 20is auto-appended unless the query already uses an aggregate or explicit limit.
6. SQL safety
- Only
SELECTstatements are allowed.DROP,DELETE,UPDATE,INSERT,ALTER,TRUNCATE,CREATE,GRANT,REVOKE, andCOPYare blocked. - A 10-second query timeout prevents runaway queries.
- Every query is approved by you before execution.