I built my Data Agent from scratch — no LangChain, no LlamaIndex, just direct API calls and custom state management. It works, but it made me wonder: what am I missing? What do frameworks actually provide beyond abstraction overhead? Here's what I learned comparing my custom implementation to LangChain, LlamaIndex, CrewAI, and AutoGen.

The Custom Approach: State Management by Hand

My Data Agent is a WebSocket-based SQL assistant that generates queries step-by-step. The user approves each step before execution — with manual state management, string concatenation for context, and raw JSON parsing.

This works, but notice what I'm doing manually:

  • String concatenation for conversation context
  • JSON parsing (fails if LLM returns markdown)
  • State stored in instance variables (lost on disconnect)
  • No token counting or context window management
  • No automatic retry logic or error recovery
  • No caching or memory persistence

Every one of these is a source of bugs and maintenance burden. Let's see what frameworks solve.

Framework 1: LangChain

Best for: General-purpose agents with tool calling

Key Features

  • Structured Memory: Automatically formats conversation as [HumanMessage(...), AIMessage(...), ToolMessage(...)] instead of string concatenation
  • Built-in Tools: SQL database, search, calculators, custom functions — all with type validation via Pydantic
  • Token Counting: Knows when context is too long, can auto-trim old messages
  • Streaming Support: Works with WebSocket streaming out of the box
  • Huge Ecosystem: 100+ integrations (Redis for persistence, Pinecone for vectors, etc.)

The Tradeoffs

ProsCons
Structured memory (no string concatenation) Heavy abstractions (harder to debug)
Pydantic validation for tool calls Frequent breaking changes (v0.1 → v0.2)
Auto context window management Opinionated (hard to customize agent loop)
Massive ecosystem Overhead for simple tasks

When to Use LangChain

LangChain shines when you need:

  • Multiple tools (search + SQL + custom functions)
  • Different LLM providers with swappable backends
  • Complex chains (retrieve docs → summarize → answer)
  • Pre-built integrations (Redis memory, Pinecone vectors)

Skip it if you have a simple, single-purpose agent where the abstraction cost exceeds the benefit.

Framework 2: LlamaIndex

Best for: RAG (retrieval-augmented generation) + structured data queries

What Makes It Different

LlamaIndex is built specifically for data retrieval and reasoning. While LangChain is a Swiss Army knife, LlamaIndex is a scalpel for data apps:

  • SQL-native: Automatic schema introspection, query optimization, error correction
  • Smart context management: Auto-prunes old messages to stay under token limits
  • Multi-step reasoning: Built-in support for iterative queries (like my agent's step-by-step flow)
  • RAG optimization: If you mix SQL with document search, LlamaIndex handles both better than LangChain

The Tradeoffs

ProsCons
Optimized for SQL and data queries Less flexible than LangChain
Automatic schema introspection Smaller ecosystem
Better context pruning Harder to add custom tools
Built-in multi-step reasoning Less mature than LangChain

When to Use LlamaIndex

LlamaIndex is the best choice if your agent's primary job is data retrieval:

  • SQL database queries (like my Data Agent)
  • Document search + question answering
  • Hybrid workflows (SQL + vector search)
  • Multi-hop reasoning ("find X, then use X to find Y")

If I were rebuilding my Data Agent with a framework, this would be my first choice.

Framework 3: CrewAI

Best for: Multi-agent collaboration

The Multi-Agent Model

CrewAI introduces a fundamentally different paradigm: instead of one agent with multiple tools, you have multiple specialized agents that collaborate.

Imagine my Data Agent split into three specialists:

  • SQL Analyst: Generates and executes queries
  • Data Interpreter: Analyzes results and finds insights
  • Visualization Expert: Decides which chart type fits the data

Each agent has its own memory, its own tools, and its own reasoning loop. They delegate tasks to each other and share findings.

The Tradeoffs

ProsCons
Role-based specialization Overkill for single-agent tasks
Shared memory across agents Young framework (less mature)
Automatic task delegation Complex debugging
Built-in collaboration patterns Higher token costs (multiple LLM calls)

When to Use CrewAI

CrewAI makes sense when the task genuinely benefits from specialization:

  • Researcher + writer + editor workflow
  • Code generator + tester + reviewer
  • Data analyst + domain expert + auditor

Don't use it just because it sounds cool. Multi-agent systems add latency, cost, and complexity. Use them when a single agent can't cover the breadth of expertise required.

Framework 4: AutoGen (Microsoft)

Best for: Conversational multi-agent systems with human-in-the-loop

The Conversational Model

AutoGen's killer feature is human-in-the-loop as a first-class citizen. My Data Agent requires user approval for each step — AutoGen makes this pattern native.

The UserProxyAgent can be configured to:

  • ALWAYS ask for human input (like my approval buttons)
  • TERMINATE when certain conditions are met
  • NEVER ask (fully autonomous)

The conversation state, message history, and approval flow are all automatic. No manual self.pending_step tracking.

The Tradeoffs

ProsCons
Human-in-the-loop built-in Verbose for simple tasks
Auto conversation management OpenAI-centric (harder with other LLMs)
Code execution built-in Heavy dependencies
Great for iterative tasks Steeper learning curve

When to Use AutoGen

AutoGen is ideal when:

  • User approval is required at each step (like my Data Agent)
  • You need code generation + execution + human review
  • The task requires back-and-forth negotiation between agents
  • You want to experiment with different approval strategies

The Comparison Table

Feature Custom LangChain LlamaIndex CrewAI AutoGen
State Management Manual instance vars ConversationBufferMemory ChatMemoryBuffer Agent memory Auto-conversation
Context Building String concatenation Auto message formatting Auto context pruning Shared memory Auto dialogue
Tool Calling Manual JSON parsing Pydantic validators Built-in tools Tool delegation Code execution
Token Management None Auto-counting Auto-trimming Auto Auto
Persistence None Pluggable (Redis, etc.) Built-in Built-in Built-in
Human-in-loop Manual Custom callback Custom Custom Built-in
SQL Support Manual SQLDatabaseToolkit NLSQLTableQueryEngine SQLDatabaseTool Manual
Learning Curve Low Medium Medium High Medium
Debugging Easy Hard Medium Very Hard Medium
Dependencies Minimal Heavy Medium Medium Heavy

What Frameworks Actually Solve

After this deep dive, here's what I've learned about where frameworks add genuine value:

1. Context Window Management (Critical)

My custom implementation doesn't track tokens. If conversation history grows beyond the LLM's context window, the agent breaks. Frameworks handle this automatically — pruning old messages, summarizing context, or switching to a larger model.

2. Persistent Memory (Production Requirement)

My state lives in self.steps — it disappears on WebSocket disconnect. Frameworks provide Redis, SQLite, or PostgreSQL backends to persist conversation state across sessions.

3. Type-Safe Tool Calling (Reliability)

My json.loads(raw) fails when the LLM returns markdown-wrapped JSON. Frameworks use Pydantic to validate tool calls, retry on parse errors, and provide clear error messages.

4. Token Counting and Cost Tracking (Observability)

I manually track tokens with OpenTelemetry. Frameworks do this automatically and can enforce budget limits ("stop if cost exceeds $5").

5. Retry and Error Recovery (Resilience)

My agent fails on LLM errors. Frameworks retry with exponential backoff, switch to fallback models, or degrade gracefully.

When Custom is Still Better

Despite all these benefits, my custom approach excels in specific scenarios:

1. Learning

Building from scratch taught me exactly how agents work. I understand the prompt engineering, the state transitions, the error cases. Frameworks abstract this away — you learn how to use the framework, not how agents actually function.

2. Full Control

My WebSocket streaming, approval buttons, and step-by-step UX are exactly what I envisioned. Frameworks impose structure — which is a feature until it's a limitation.

3. Minimal Dependencies

My requirements.txt has 3 LLM-related packages. LangChain pulls in 50+. For a portfolio project, dependency bloat matters.

4. Custom Approval Workflows

My agent has Approve, Revise, Skip, and Exit buttons. Frameworks make multi-option approval harder — they're designed for binary yes/no or fully autonomous agents.

The Verdict: Which Should You Use?

ScenarioRecommendationWhy
Learning how agents work Custom Understand the internals before using abstractions
Portfolio project Custom Shows you can build without leaning on frameworks
Production SQL agent LlamaIndex Best SQL support, context management, multi-step reasoning
General-purpose assistant LangChain Widest tool support, most integrations
Multi-agent collaboration CrewAI or AutoGen Built for agent teams, not single agents
Human-in-the-loop approval AutoGen Native support for iterative approval workflows
Startup MVP LangChain Move fast, use pre-built integrations, refactor later
Regulated industry (finance, healthcare) Custom or LlamaIndex Full audit trail, no black-box framework magic

What I'd Do Differently

If I were rebuilding my Data Agent today with production requirements:

  1. Start custom to learn the patterns and identify the pain points
  2. Add LlamaIndex for state management and SQL optimization
  3. Keep custom prompt engineering and guardrails (frameworks don't understand your domain)
  4. Use framework memory for persistence (no need to reinvent Redis integration)
  5. Keep custom UX (my approval buttons, step-by-step UI)

In other words: use frameworks for infrastructure (memory, tool validation, token management), but keep control over the agent loop and user experience.

Conclusion

Frameworks aren't magic. They're opinionated solutions to common problems:

  • LangChain: Swiss Army knife for general-purpose agents
  • LlamaIndex: Scalpel for data retrieval and SQL agents
  • CrewAI: Multi-agent collaboration platform
  • AutoGen: Human-in-the-loop conversational agents

My custom Data Agent taught me when frameworks help and when they hinder. For learning and portfolio work, custom wins. For production at scale, frameworks handle the infrastructure so you can focus on domain logic.

The best approach? Build it custom first, identify the pain points, then reach for a framework that solves those specific problems. Don't start with LangChain because everyone else uses it — start with understanding what your agent actually needs.