I built my Data Agent from scratch — no LangChain, no LlamaIndex, just direct API calls and custom state management. It works, but it made me wonder: what am I missing? What do frameworks actually provide beyond abstraction overhead? Here's what I learned comparing my custom implementation to LangChain, LlamaIndex, CrewAI, and AutoGen.
The Custom Approach: State Management by Hand
My Data Agent is a WebSocket-based SQL assistant that generates queries step-by-step. The user approves each step before execution — with manual state management, string concatenation for context, and raw JSON parsing.
This works, but notice what I'm doing manually:
- String concatenation for conversation context
- JSON parsing (fails if LLM returns markdown)
- State stored in instance variables (lost on disconnect)
- No token counting or context window management
- No automatic retry logic or error recovery
- No caching or memory persistence
Every one of these is a source of bugs and maintenance burden. Let's see what frameworks solve.
Framework 1: LangChain
Best for: General-purpose agents with tool calling
Key Features
- Structured Memory: Automatically formats conversation as
[HumanMessage(...), AIMessage(...), ToolMessage(...)]instead of string concatenation - Built-in Tools: SQL database, search, calculators, custom functions — all with type validation via Pydantic
- Token Counting: Knows when context is too long, can auto-trim old messages
- Streaming Support: Works with WebSocket streaming out of the box
- Huge Ecosystem: 100+ integrations (Redis for persistence, Pinecone for vectors, etc.)
The Tradeoffs
| Pros | Cons |
|---|---|
| Structured memory (no string concatenation) | Heavy abstractions (harder to debug) |
| Pydantic validation for tool calls | Frequent breaking changes (v0.1 → v0.2) |
| Auto context window management | Opinionated (hard to customize agent loop) |
| Massive ecosystem | Overhead for simple tasks |
When to Use LangChain
LangChain shines when you need:
- Multiple tools (search + SQL + custom functions)
- Different LLM providers with swappable backends
- Complex chains (retrieve docs → summarize → answer)
- Pre-built integrations (Redis memory, Pinecone vectors)
Skip it if you have a simple, single-purpose agent where the abstraction cost exceeds the benefit.
Framework 2: LlamaIndex
Best for: RAG (retrieval-augmented generation) + structured data queries
What Makes It Different
LlamaIndex is built specifically for data retrieval and reasoning. While LangChain is a Swiss Army knife, LlamaIndex is a scalpel for data apps:
- SQL-native: Automatic schema introspection, query optimization, error correction
- Smart context management: Auto-prunes old messages to stay under token limits
- Multi-step reasoning: Built-in support for iterative queries (like my agent's step-by-step flow)
- RAG optimization: If you mix SQL with document search, LlamaIndex handles both better than LangChain
The Tradeoffs
| Pros | Cons |
|---|---|
| Optimized for SQL and data queries | Less flexible than LangChain |
| Automatic schema introspection | Smaller ecosystem |
| Better context pruning | Harder to add custom tools |
| Built-in multi-step reasoning | Less mature than LangChain |
When to Use LlamaIndex
LlamaIndex is the best choice if your agent's primary job is data retrieval:
- SQL database queries (like my Data Agent)
- Document search + question answering
- Hybrid workflows (SQL + vector search)
- Multi-hop reasoning ("find X, then use X to find Y")
If I were rebuilding my Data Agent with a framework, this would be my first choice.
Framework 3: CrewAI
Best for: Multi-agent collaboration
The Multi-Agent Model
CrewAI introduces a fundamentally different paradigm: instead of one agent with multiple tools, you have multiple specialized agents that collaborate.
Imagine my Data Agent split into three specialists:
- SQL Analyst: Generates and executes queries
- Data Interpreter: Analyzes results and finds insights
- Visualization Expert: Decides which chart type fits the data
Each agent has its own memory, its own tools, and its own reasoning loop. They delegate tasks to each other and share findings.
The Tradeoffs
| Pros | Cons |
|---|---|
| Role-based specialization | Overkill for single-agent tasks |
| Shared memory across agents | Young framework (less mature) |
| Automatic task delegation | Complex debugging |
| Built-in collaboration patterns | Higher token costs (multiple LLM calls) |
When to Use CrewAI
CrewAI makes sense when the task genuinely benefits from specialization:
- Researcher + writer + editor workflow
- Code generator + tester + reviewer
- Data analyst + domain expert + auditor
Don't use it just because it sounds cool. Multi-agent systems add latency, cost, and complexity. Use them when a single agent can't cover the breadth of expertise required.
Framework 4: AutoGen (Microsoft)
Best for: Conversational multi-agent systems with human-in-the-loop
The Conversational Model
AutoGen's killer feature is human-in-the-loop as a first-class citizen. My Data Agent requires user approval for each step — AutoGen makes this pattern native.
The UserProxyAgent can be configured to:
ALWAYSask for human input (like my approval buttons)TERMINATEwhen certain conditions are metNEVERask (fully autonomous)
The conversation state, message history, and approval flow are all automatic. No manual self.pending_step tracking.
The Tradeoffs
| Pros | Cons |
|---|---|
| Human-in-the-loop built-in | Verbose for simple tasks |
| Auto conversation management | OpenAI-centric (harder with other LLMs) |
| Code execution built-in | Heavy dependencies |
| Great for iterative tasks | Steeper learning curve |
When to Use AutoGen
AutoGen is ideal when:
- User approval is required at each step (like my Data Agent)
- You need code generation + execution + human review
- The task requires back-and-forth negotiation between agents
- You want to experiment with different approval strategies
The Comparison Table
| Feature | Custom | LangChain | LlamaIndex | CrewAI | AutoGen |
|---|---|---|---|---|---|
| State Management | Manual instance vars | ConversationBufferMemory | ChatMemoryBuffer | Agent memory | Auto-conversation |
| Context Building | String concatenation | Auto message formatting | Auto context pruning | Shared memory | Auto dialogue |
| Tool Calling | Manual JSON parsing | Pydantic validators | Built-in tools | Tool delegation | Code execution |
| Token Management | None | Auto-counting | Auto-trimming | Auto | Auto |
| Persistence | None | Pluggable (Redis, etc.) | Built-in | Built-in | Built-in |
| Human-in-loop | Manual | Custom callback | Custom | Custom | Built-in |
| SQL Support | Manual | SQLDatabaseToolkit | NLSQLTableQueryEngine | SQLDatabaseTool | Manual |
| Learning Curve | Low | Medium | Medium | High | Medium |
| Debugging | Easy | Hard | Medium | Very Hard | Medium |
| Dependencies | Minimal | Heavy | Medium | Medium | Heavy |
What Frameworks Actually Solve
After this deep dive, here's what I've learned about where frameworks add genuine value:
1. Context Window Management (Critical)
My custom implementation doesn't track tokens. If conversation history grows beyond the LLM's context window, the agent breaks. Frameworks handle this automatically — pruning old messages, summarizing context, or switching to a larger model.
2. Persistent Memory (Production Requirement)
My state lives in self.steps — it disappears on WebSocket disconnect. Frameworks provide Redis, SQLite, or PostgreSQL backends to persist conversation state across sessions.
3. Type-Safe Tool Calling (Reliability)
My json.loads(raw) fails when the LLM returns markdown-wrapped JSON. Frameworks use Pydantic to validate tool calls, retry on parse errors, and provide clear error messages.
4. Token Counting and Cost Tracking (Observability)
I manually track tokens with OpenTelemetry. Frameworks do this automatically and can enforce budget limits ("stop if cost exceeds $5").
5. Retry and Error Recovery (Resilience)
My agent fails on LLM errors. Frameworks retry with exponential backoff, switch to fallback models, or degrade gracefully.
When Custom is Still Better
Despite all these benefits, my custom approach excels in specific scenarios:
1. Learning
Building from scratch taught me exactly how agents work. I understand the prompt engineering, the state transitions, the error cases. Frameworks abstract this away — you learn how to use the framework, not how agents actually function.
2. Full Control
My WebSocket streaming, approval buttons, and step-by-step UX are exactly what I envisioned. Frameworks impose structure — which is a feature until it's a limitation.
3. Minimal Dependencies
My requirements.txt has 3 LLM-related packages. LangChain pulls in 50+. For a portfolio project, dependency bloat matters.
4. Custom Approval Workflows
My agent has Approve, Revise, Skip, and Exit buttons. Frameworks make multi-option approval harder — they're designed for binary yes/no or fully autonomous agents.
The Verdict: Which Should You Use?
| Scenario | Recommendation | Why |
|---|---|---|
| Learning how agents work | Custom | Understand the internals before using abstractions |
| Portfolio project | Custom | Shows you can build without leaning on frameworks |
| Production SQL agent | LlamaIndex | Best SQL support, context management, multi-step reasoning |
| General-purpose assistant | LangChain | Widest tool support, most integrations |
| Multi-agent collaboration | CrewAI or AutoGen | Built for agent teams, not single agents |
| Human-in-the-loop approval | AutoGen | Native support for iterative approval workflows |
| Startup MVP | LangChain | Move fast, use pre-built integrations, refactor later |
| Regulated industry (finance, healthcare) | Custom or LlamaIndex | Full audit trail, no black-box framework magic |
What I'd Do Differently
If I were rebuilding my Data Agent today with production requirements:
- Start custom to learn the patterns and identify the pain points
- Add LlamaIndex for state management and SQL optimization
- Keep custom prompt engineering and guardrails (frameworks don't understand your domain)
- Use framework memory for persistence (no need to reinvent Redis integration)
- Keep custom UX (my approval buttons, step-by-step UI)
In other words: use frameworks for infrastructure (memory, tool validation, token management), but keep control over the agent loop and user experience.
Conclusion
Frameworks aren't magic. They're opinionated solutions to common problems:
- LangChain: Swiss Army knife for general-purpose agents
- LlamaIndex: Scalpel for data retrieval and SQL agents
- CrewAI: Multi-agent collaboration platform
- AutoGen: Human-in-the-loop conversational agents
My custom Data Agent taught me when frameworks help and when they hinder. For learning and portfolio work, custom wins. For production at scale, frameworks handle the infrastructure so you can focus on domain logic.
The best approach? Build it custom first, identify the pain points, then reach for a framework that solves those specific problems. Don't start with LangChain because everyone else uses it — start with understanding what your agent actually needs.