Deep Dive: AI Agent Frameworks - Marcel Pellicero Esteban

I built my Data Agent from scratch — no LangChain, no LlamaIndex, just direct API calls and custom state management. It works, but it made me wonder: what am I missing? What do frameworks actually provide beyond abstraction overhead? Here's what I learned comparing my custom implementation to LangChain, LlamaIndex, CrewAI, and AutoGen.

The Custom Approach: State Management by Hand

My Data Agent is a WebSocket-based SQL assistant that generates queries step-by-step. The user approves each step before execution — with manual state management, string concatenation for context, and raw JSON parsing.

This works, but notice what I'm doing manually:

String concatenation for conversation context
JSON parsing (fails if LLM returns markdown)
State stored in instance variables (lost on disconnect)
No token counting or context window management
No automatic retry logic or error recovery
No caching or memory persistence

Every one of these is a source of bugs and maintenance burden. Let's see what frameworks solve.

Framework 1: LangChain

Best for: General-purpose agents with tool calling

Key Features

Structured Memory: Automatically formats conversation as [HumanMessage(...), AIMessage(...), ToolMessage(...)] instead of string concatenation
Built-in Tools: SQL database, search, calculators, custom functions — all with type validation via Pydantic
Token Counting: Knows when context is too long, can auto-trim old messages
Streaming Support: Works with WebSocket streaming out of the box
Huge Ecosystem: 100+ integrations (Redis for persistence, Pinecone for vectors, etc.)

The Tradeoffs

Pros	Cons
Structured memory (no string concatenation)	Heavy abstractions (harder to debug)
Pydantic validation for tool calls	Frequent breaking changes (v0.1 → v0.2)
Auto context window management	Opinionated (hard to customize agent loop)
Massive ecosystem	Overhead for simple tasks

When to Use LangChain

LangChain shines when you need:

Multiple tools (search + SQL + custom functions)
Different LLM providers with swappable backends
Complex chains (retrieve docs → summarize → answer)
Pre-built integrations (Redis memory, Pinecone vectors)

Skip it if you have a simple, single-purpose agent where the abstraction cost exceeds the benefit.

Framework 2: LlamaIndex

Best for: RAG (retrieval-augmented generation) + structured data queries

What Makes It Different

LlamaIndex is built specifically for data retrieval and reasoning. While LangChain is a Swiss Army knife, LlamaIndex is a scalpel for data apps:

SQL-native: Automatic schema introspection, query optimization, error correction
Smart context management: Auto-prunes old messages to stay under token limits
Multi-step reasoning: Built-in support for iterative queries (like my agent's step-by-step flow)
RAG optimization: If you mix SQL with document search, LlamaIndex handles both better than LangChain

The Tradeoffs

Pros	Cons
Optimized for SQL and data queries	Less flexible than LangChain
Automatic schema introspection	Smaller ecosystem
Better context pruning	Harder to add custom tools
Built-in multi-step reasoning	Less mature than LangChain

When to Use LlamaIndex

LlamaIndex is the best choice if your agent's primary job is data retrieval:

SQL database queries (like my Data Agent)
Document search + question answering
Hybrid workflows (SQL + vector search)
Multi-hop reasoning ("find X, then use X to find Y")

If I were rebuilding my Data Agent with a framework, this would be my first choice.

Framework 3: CrewAI

Best for: Multi-agent collaboration

The Multi-Agent Model

CrewAI introduces a fundamentally different paradigm: instead of one agent with multiple tools, you have multiple specialized agents that collaborate.

Imagine my Data Agent split into three specialists:

SQL Analyst: Generates and executes queries
Data Interpreter: Analyzes results and finds insights
Visualization Expert: Decides which chart type fits the data

Each agent has its own memory, its own tools, and its own reasoning loop. They delegate tasks to each other and share findings.

The Tradeoffs

Pros	Cons
Role-based specialization	Overkill for single-agent tasks
Shared memory across agents	Young framework (less mature)
Automatic task delegation	Complex debugging
Built-in collaboration patterns	Higher token costs (multiple LLM calls)

When to Use CrewAI

CrewAI makes sense when the task genuinely benefits from specialization:

Researcher + writer + editor workflow
Code generator + tester + reviewer
Data analyst + domain expert + auditor

Don't use it just because it sounds cool. Multi-agent systems add latency, cost, and complexity. Use them when a single agent can't cover the breadth of expertise required.

Framework 4: AutoGen (Microsoft)

Best for: Conversational multi-agent systems with human-in-the-loop

The Conversational Model

AutoGen's killer feature is human-in-the-loop as a first-class citizen. My Data Agent requires user approval for each step — AutoGen makes this pattern native.

The UserProxyAgent can be configured to:

ALWAYS ask for human input (like my approval buttons)
TERMINATE when certain conditions are met
NEVER ask (fully autonomous)

The conversation state, message history, and approval flow are all automatic. No manual self.pending_step tracking.

The Tradeoffs

Pros	Cons
Human-in-the-loop built-in	Verbose for simple tasks
Auto conversation management	OpenAI-centric (harder with other LLMs)
Code execution built-in	Heavy dependencies
Great for iterative tasks	Steeper learning curve

When to Use AutoGen

AutoGen is ideal when:

User approval is required at each step (like my Data Agent)
You need code generation + execution + human review
The task requires back-and-forth negotiation between agents
You want to experiment with different approval strategies

The Comparison Table

Feature	Custom	LangChain	LlamaIndex	CrewAI	AutoGen
State Management	Manual instance vars	ConversationBufferMemory	ChatMemoryBuffer	Agent memory	Auto-conversation
Context Building	String concatenation	Auto message formatting	Auto context pruning	Shared memory	Auto dialogue
Tool Calling	Manual JSON parsing	Pydantic validators	Built-in tools	Tool delegation	Code execution
Token Management	None	Auto-counting	Auto-trimming	Auto	Auto
Persistence	None	Pluggable (Redis, etc.)	Built-in	Built-in	Built-in
Human-in-loop	Manual	Custom callback	Custom	Custom	Built-in
SQL Support	Manual	SQLDatabaseToolkit	NLSQLTableQueryEngine	SQLDatabaseTool	Manual
Learning Curve	Low	Medium	Medium	High	Medium
Debugging	Easy	Hard	Medium	Very Hard	Medium
Dependencies	Minimal	Heavy	Medium	Medium	Heavy

What Frameworks Actually Solve

After this deep dive, here's what I've learned about where frameworks add genuine value:

1. Context Window Management (Critical)

My custom implementation doesn't track tokens. If conversation history grows beyond the LLM's context window, the agent breaks. Frameworks handle this automatically — pruning old messages, summarizing context, or switching to a larger model.

2. Persistent Memory (Production Requirement)

My state lives in self.steps — it disappears on WebSocket disconnect. Frameworks provide Redis, SQLite, or PostgreSQL backends to persist conversation state across sessions.

3. Type-Safe Tool Calling (Reliability)

My json.loads(raw) fails when the LLM returns markdown-wrapped JSON. Frameworks use Pydantic to validate tool calls, retry on parse errors, and provide clear error messages.

4. Token Counting and Cost Tracking (Observability)

I manually track tokens with OpenTelemetry. Frameworks do this automatically and can enforce budget limits ("stop if cost exceeds $5").

5. Retry and Error Recovery (Resilience)

My agent fails on LLM errors. Frameworks retry with exponential backoff, switch to fallback models, or degrade gracefully.

When Custom is Still Better

Despite all these benefits, my custom approach excels in specific scenarios:

1. Learning

Building from scratch taught me exactly how agents work. I understand the prompt engineering, the state transitions, the error cases. Frameworks abstract this away — you learn how to use the framework, not how agents actually function.

2. Full Control

My WebSocket streaming, approval buttons, and step-by-step UX are exactly what I envisioned. Frameworks impose structure — which is a feature until it's a limitation.

3. Minimal Dependencies

My requirements.txt has 3 LLM-related packages. LangChain pulls in 50+. For a portfolio project, dependency bloat matters.

4. Custom Approval Workflows

My agent has Approve, Revise, Skip, and Exit buttons. Frameworks make multi-option approval harder — they're designed for binary yes/no or fully autonomous agents.

The Verdict: Which Should You Use?

Scenario	Recommendation	Why
Learning how agents work	Custom	Understand the internals before using abstractions
Portfolio project	Custom	Shows you can build without leaning on frameworks
Production SQL agent	LlamaIndex	Best SQL support, context management, multi-step reasoning
General-purpose assistant	LangChain	Widest tool support, most integrations
Multi-agent collaboration	CrewAI or AutoGen	Built for agent teams, not single agents
Human-in-the-loop approval	AutoGen	Native support for iterative approval workflows
Startup MVP	LangChain	Move fast, use pre-built integrations, refactor later
Regulated industry (finance, healthcare)	Custom or LlamaIndex	Full audit trail, no black-box framework magic

What I'd Do Differently

If I were rebuilding my Data Agent today with production requirements:

Start custom to learn the patterns and identify the pain points
Add LlamaIndex for state management and SQL optimization
Keep custom prompt engineering and guardrails (frameworks don't understand your domain)
Use framework memory for persistence (no need to reinvent Redis integration)
Keep custom UX (my approval buttons, step-by-step UI)

In other words: use frameworks for infrastructure (memory, tool validation, token management), but keep control over the agent loop and user experience.

Conclusion

Frameworks aren't magic. They're opinionated solutions to common problems:

LangChain: Swiss Army knife for general-purpose agents
LlamaIndex: Scalpel for data retrieval and SQL agents
CrewAI: Multi-agent collaboration platform
AutoGen: Human-in-the-loop conversational agents

My custom Data Agent taught me when frameworks help and when they hinder. For learning and portfolio work, custom wins. For production at scale, frameworks handle the infrastructure so you can focus on domain logic.

The best approach? Build it custom first, identify the pain points, then reach for a framework that solves those specific problems. Don't start with LangChain because everyone else uses it — start with understanding what your agent actually needs.