Blog · Jun 16, 2026

Claude Agent SDK vs LangGraph: Which Agent Stack to Pick (and When You Need MCP)

A use-case-by-use-case verdict on Claude Agent SDK versus LangGraph, plus the honest answer on when MCP and multi-agent actually earn their keep.

By Craig Mason 8 min read

#ai-agents #architecture #langgraph #mcp

Claude Agent SDK vs LangGraph: Which Agent Stack to Pick (and When You Need MCP)

Claude Agent SDK or LangGraph? Pick LangGraph when you need a control plane you own — approvals, multi-model routing, durable replay, vendor hedging — and pick the Claude Agent SDK when one team needs to ship a Claude-based product fast with the agent loop already built. That one line settles most of the argument. The rest of this piece is the unpacking, because the line hides a real architectural choice and a few traps people fall into.

The two tools are not competitors in the way the framing suggests. One is a control plane. The other is an execution engine. They sit at different layers, and most teams that pit them against each other are really asking a different question: how much of my agent’s machinery do I want to own?

What is the Claude Agent SDK, actually?

The Claude Agent SDK (its hosted form is what Anthropic now ships as Managed Agents) is an execution engine. Anthropic runs the agent loop on its orchestration layer, and each session provisions a container where the agent’s tools run. You hand it a config — model, system prompt, tools — and it does the looping, the tool dispatch, and the session bookkeeping for you.

The built-in toolset is the selling point. Out of the box you get bash, read, write, edit, glob, grep, web_fetch, and web_search, all executing in the sandboxed container. You don’t write a tool runner. You don’t manage the loop. Subagents are a first-class field on the agent (multiagent: {type: "coordinator", agents: [...]}), permission policies gate tools behind always_ask confirmation when you want a human in the loop, and sessions are stateful and versioned so you can pin a session to a known-good config.

The tight Claude integration is where it pulls ahead and where it locks you in. Prompt caching, extended thinking, MCP support, and the per-session container all come wired up. The default model is claude-opus-4-8 at $5 input / $25 output per million tokens, with a 1M-token context window. It is single-vendor by design. That is the whole pitch and the whole catch in one sentence.

What does LangGraph do that the SDK doesn’t?

LangGraph is a control plane. You model your agent as a graph of nodes and edges, and the framework gives you durable execution: checkpointing, human-in-the-loop pauses, time-travel and replay, and — the big one — model-agnostic routing. The agent loop is yours. You decide what runs where.

That ownership is the point. LangGraph doesn’t ship a Claude-tuned execution engine because it isn’t trying to. It’s trying to be the substrate your orchestration logic lives on, no matter which model answers a given node. Klarna, Uber, and LinkedIn run it at production scale, which matters less as a logo parade and more as evidence that the durability and replay machinery holds up under real load.

If your agent needs to pause for a human approval, route a cheap classification step to a small model and the hard reasoning step to a frontier one, or survive a process restart mid-run without losing state, LangGraph already has the primitives. The Claude Agent SDK gives you some of these (permission gates, session state), but bolted to one vendor’s loop.

Claude Agent SDK vs LangGraph: the comparison that matters

Here is the split that should drive your decision.

Dimension	Claude Agent SDK	LangGraph
Layer	Execution engine	Control plane
Who runs the loop	Anthropic’s orchestration layer	You
Model coverage	Single-vendor (Claude)	Model-agnostic
Built-in tools	bash, file/code, web search, fetch	None — you wire tools
Durability / replay	Session state, versioning	Checkpointing, time-travel, replay
Human-in-the-loop	Permission policies (`always_ask`)	First-class pause/resume nodes
Production track record	Newer, Anthropic-hosted	Klarna, Uber, LinkedIn at scale
Speed to first agent	Fast — loop is built	Slower — you build the graph
Vendor lock-in	High by design	Low

Read that table as a question about your constraints, not a scorecard. If “single-vendor” is a non-issue for you — you’ve standardized on Claude and you’re happy there — the lock-in row is a feature, not a cost. If a procurement team is going to ask “what happens when this model gets expensive or deprecated,” the model-agnostic row is the entire conversation.

When should you reach for which?

A decision list, because the table only gets you halfway.

Solo dev or small team shipping a Claude product fast? Claude Agent SDK. The built-in loop and toolset are worth weeks. You’ll have a working agent before you’d have finished modeling the graph.
Dev tooling or autonomous workers that live inside Claude’s strengths? Claude Agent SDK. Computer use, extended thinking, and the file/shell toolset are right there.
Need approval gates, pauses, or anything that must survive a restart? LangGraph. Durable checkpointing and human-in-the-loop are its home turf.
Routing across multiple models — cheap model for triage, frontier model for the hard step? LangGraph. Multi-model routing is the thing it does that the SDK structurally cannot.
Hedging vendor risk because a procurement or compliance team requires it? LangGraph. Model-agnostic by construction.
Want to port to other models later? LangGraph. On the SDK, porting means maintaining parallel code paths for each non-Claude model — it’s doable, but you’re fighting the design.

Notice these don’t overlap much. The choice is usually obvious once you name your real constraint. The mistake is choosing on vibes (“LangGraph is more serious”) instead of constraints (“I need approval gates”).

Do you actually need MCP, or just an API?

This is the question that trips people up, so be precise. MCP — the Model Context Protocol — is for an LLM to dynamically discover and invoke tools at runtime, across many systems, with standardized auth (OAuth 2.1). Its real value is turning an N×M integration problem into N+M: instead of wiring every model to every tool, each side speaks one protocol.

A plain API is better when you want direct, deterministic control of a single integration from your own application code. If your agent calls one well-known endpoint in a fixed way, MCP is overhead. You know the call, you know the shape, you write the request. Done.

Here’s the part most explanations skip: MCP wraps APIs, it doesn’t replace them. An MCP server is a standardized front door to capabilities that are usually backed by ordinary APIs underneath. So the question is never “MCP or API” in the abstract. It’s “do I have a discovery-and-many-systems problem, or a call-one-thing problem?”

Situation	Use
One integration, fixed calls, full control wanted	Plain API in app code
Many systems, runtime tool discovery, standardized auth	MCP
Letting the model choose among a large, changing tool set	MCP
A deterministic step you don’t want the model improvising	Plain API

Reach for MCP when the integration surface is broad and dynamic. Reach for a plain API when it’s narrow and fixed. If you’re adding MCP to call a single endpoint you already control, stop — you’ve added a protocol to solve a problem you don’t have.

When do you need multi-agent at all?

The honest position, and the one that’ll save you the most grief: most teams reach for multi-agent too early. Single-agent-with-tools is enough far longer than the conference talks suggest. A capable model with a good tool surface and a clear system prompt handles an enormous range of work without any orchestration of separate agents.

You graduate to multi-agent when you hit a real reason, not an aesthetic one. The real reasons are specific:

Distinct, long-running roles — a researcher that runs for minutes while a writer waits on its output, each with its own context and tools.
Human approval gates between stages — where one agent’s work must be reviewed before the next stage proceeds.
Genuine multi-model needs — a step that truly belongs on a different model than the rest of the run.

Absent one of those, multi-agent adds coordination overhead, more failure modes, and a debugging surface that grows fast — the same way agents keep failing in production for reasons that have nothing to do with the model and everything to do with the machinery around it. The Claude Agent SDK supports subagents (max 25 concurrent threads, one level of delegation), and LangGraph models multi-agent as a graph natively. Both can do it. The question is whether you should, and the default answer is “not yet.”

A useful test: if you can’t name which of the three reasons above applies to you, you don’t need multi-agent. Build the single agent first. Add a second one when a concrete bottleneck forces your hand — and when you do, know what it actually costs to run, because every extra agent is extra tokens, extra latency, and extra surface area.

So which one wins?

Neither, because they’re answering different questions. If you’ve standardized on Claude and you want an agent running this week, the Claude Agent SDK is the faster, cleaner path, and the single-vendor lock-in is a trade you’re making on purpose. If you need to own your orchestration — approvals, multi-model routing, durable replay, or a credible answer to vendor risk — LangGraph is the substrate built for exactly that, with the production miles to back it.

Pick the layer that matches your real constraint. Add MCP only when you have a many-systems discovery problem, not a call-one-thing problem. And don’t reach for multi-agent until a concrete bottleneck makes you. Most of the regret in agent architecture comes from buying machinery before you have the problem it solves.

FAQ

Is the Claude Agent SDK a replacement for LangGraph?

No. They operate at different layers. The Claude Agent SDK is an execution engine that runs the agent loop and ships a built-in toolset for Claude. LangGraph is a control plane that gives you durable, model-agnostic orchestration where you own the loop. You could even run a Claude-based execution step inside a LangGraph graph. Choose the SDK for speed on Claude; choose LangGraph when you need to own the orchestration or route across models.

Do I need MCP if I’m only calling one API?

No. MCP solves runtime tool discovery across many systems with standardized auth, turning an N×M integration problem into N+M. If you call a single, well-known endpoint in a fixed way, a plain API in your application code is simpler and more deterministic. MCP wraps APIs rather than replacing them, so adding it for one fixed call is overhead with no payoff.

When is single-agent-with-tools better than multi-agent?

Almost always, until you hit a specific reason to split. Single-agent-with-tools handles most workloads with less coordination overhead, fewer failure modes, and a smaller debugging surface. Move to multi-agent only when you have distinct long-running roles, human approval gates between stages, or a genuine multi-model requirement. If you can’t name which of those applies, you don’t need it yet.

Found this useful? Read more from the blog →