Skip to content

Agent Runtime

The layer between LLM output and real-world action. The runtime provides the tool loop, memory, planning, and protocol plumbing that turns a model into an agent. The decisions here determine reliability more than model choice does.

The Production Decision

Framework and runtime decisions are reliability and maintainability tradeoffs:

  • Do you need a framework, or is a raw loop sufficient?
  • What happens when a tool fails mid-task — retry, skip, or abort?
  • How much state needs to persist across turns, and where does it live?
  • Are you building a single agent or orchestrating multiple agents?

The Tool Loop

Every agent runtime is a variation on the same loop:

User Goal

LLM call — generates thought + tool call (or final answer)

[if tool call] Runtime executes the tool

Tool result appended to context

LLM call again (repeat until final answer or max steps)

The critical decisions are in the edges: what happens when the tool errors, when the model hallucinates a tool name, when max steps is reached, or when two tool calls conflict.

Tool Use & Function Calling → | Orchestration →

Tool Design

Tool reliability is the single biggest source of agent failures in production:

  • One action per tool. Tools that do multiple things produce unpredictable results when partially successful.
  • Idempotent where possible. The model may call the same tool twice. Write tools that survive duplicate calls.
  • Return structured data. The model must parse the result. JSON with a consistent schema is more reliable than free-text responses.
  • Fail loudly. An error message with context ("file not found: /tmp/foo.txt") is more useful to the model than a generic exception or silent empty response.
  • Keep the tool list short. Models perform worse with more than ~20 tools in context. Scope toolsets to the task.

MCP (Model Context Protocol)

A protocol — not a framework — for standardized tool and resource discovery. Anthropic-developed and now supported across Claude, many agent frameworks, and third-party tool providers.

MCP defines:

  • Tools — functions an agent can call
  • Resources — data sources an agent can read
  • Prompts — reusable prompt templates

What it solves: Without MCP, every integration between an agent and an external system is custom plumbing. With MCP, any MCP-compatible agent can use any MCP-compatible tool server without bespoke glue code.

What it does not solve: MCP is not a security boundary. A tool server an agent can call can do anything the server is authorized to do. Prompt injection via tool results is a real attack surface. Prompt Injection & Security →

Adoption reality (2025): MCP is widely adopted in Claude tooling and gaining traction in the broader ecosystem (LangChain, Cursor, many IDE tools). It is not yet universal. Self-hosted MCP servers require ops investment. Plan for both MCP and non-MCP tool integrations in any real system.

Memory

TypeWhere it livesScopeAppropriate for
In-context (conversation history)Model context windowSingle sessionShort tasks, low turn count
Summarized historyLLM-compressed, stored externallySingle sessionLong conversations, token budget management
Vector DB (semantic retrieval)External store (Pinecone, Qdrant, pgvector)Cross-sessionKnowledge retrieval, long-term user context
Key-value storeRedis, SQLiteCross-sessionStructured facts, user preferences
Episodic memoryLLM-processed event logCross-sessionAgents that need to recall what they did

Conversation history is the easiest to implement and the most common source of context overflow. Use summarization or truncation before the window fills, not after. Context Management →

Frameworks

FrameworkModelBest forWatch out for
LangGraphStateful graph executionComplex multi-step agents, branchingSteep learning curve, verbose graph definition
CrewAIRole-based multi-agentTeams of specialized agentsRole definitions need tuning, coordination overhead
AutoGenConversational multi-agentCollaborative problem-solvingNon-deterministic by design, hard to test
LlamaIndex WorkflowsEvent-driven stepsData pipelines, RAG agentsPrimarily document-oriented
Claude Code SDKSubprocess-based codingCoding agents, file manipulationClaude-only, not general-purpose
Raw loop (no framework)Direct API callsSimple agents, full controlYou own retry logic, state, error handling

Framework vs raw loop: A framework adds abstractions — state management, graph execution, role coordination. These abstractions also hide failures. A tool error that would surface immediately in a raw loop may be silently swallowed by a framework's retry handler. Start with the simplest thing that works and add framework complexity only when you have a concrete problem it solves.

Agent Frameworks →

Planning Patterns

PatternHow it worksTradeoffs
ReActInterleave reasoning and action in a single prompt loopSimple, works well for linear tasks
Plan-then-executeGenerate full plan first, then execute stepsMore structured, but plan becomes stale if early steps fail
Tree of ThoughtExplore multiple reasoning branches, select bestHigh token cost, useful for complex single decisions
ReflectionAgent critiques its own output before finalizingCatches some errors, adds latency and tokens
Multi-agent consensusMultiple agents produce answers, vote or reconcileHigh cost, rarely better than one strong model

For most production agents, ReAct is the right starting pattern. Add planning structure only when the task genuinely requires it — complexity that is not earned will hurt reliability.

Production Reality

Framework abstractions hide failure modes. When a tool fails inside a framework's retry handler, you may get a degraded answer rather than an error. Instrument tool calls directly, not just the outer agent call.

Planning loops are non-deterministic by design. The same input can produce different tool call sequences across runs. This makes testing hard. Build evals that measure outcome quality, not execution path. Evaluations →

MCP servers are trusted by default. An agent that connects to an MCP server will execute whatever tools that server exposes. Vet MCP server implementations before connecting agents to them in production. Tool results can contain prompt injection payloads.

Max steps is not a safety net. An agent that hits max steps has likely already taken irreversible actions. Design tasks so each step is independently auditable and reversible where possible. Add step-level logging before you need to debug a runaway agent in production.

Tool hallucination is a real failure mode. Models will occasionally call tools with fabricated arguments or call tools that do not exist in the current tool list. Validate all tool call arguments at the runtime layer before execution.

Released under the MIT License.