Agentic AI is moving fast, but the underlying design patterns are stabilizing. Microsoft's AutoGen team, AWS's multi-agent blueprints, Anthropic's agent research, and LangGraph's graph-based orchestration have converged on a small set of composable patterns.
This guide distills those patterns, explains when each applies, and shows working Python implementations — not pseudocode.
A system is agentic when the model controls the sequence of steps rather than a hardcoded pipeline. The model reads a goal, decides which tools to call, observes the results, and loops until the goal is satisfied or a stopping condition is hit.
That's the minimum bar. Beyond it, systems add memory, parallelism, specialized sub-agents, and self-correction. Each addition introduces a design choice — and a corresponding pattern.
| Pattern | What it solves | When to use |
|---|
| ReAct | Interleaving reasoning with tool use | Single-task agents with tool access |
| Reflection | Self-evaluation and revision | Output quality matters more than speed |
| Tool Use | Extending model capability | Any task beyond text generation |
| Planning | Breaking goals into sub-steps | Long-horizon tasks |
| Multi-Agent | Parallelism + specialization | Tasks too large for one context window |
| Memory | Persistence across turns | Stateful, long-running sessions |
| Routing | Dynamic task dispatch | Mixed workloads with different specialists |
ReAct is the foundational pattern. The model alternates between Thought (reasoning about the current state), Action (calling a tool), and Observation (reading the result). This loop continues until the model emits a final answer.
It was formalized in the 2022 paper "ReAct: Synergizing Reasoning and Acting in Language Models" and is the basis for most agent frameworks today.
References:
import anthropic
import json
client = anthropic.Anthropic()
tools = [
{
"name": "search_web",
"description": "Search the web and return a summary of top results.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"}
},
"required": ["query"],
},
},
{
"name": "calculate",
"description": "Evaluate a mathematical expression and return the result.",
"input_schema": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression, e.g. '42 * 1.08'"}
},
"required": ["expression"],
},
},
]
def handle_tool_call(tool_name: str, tool_input: dict) -> str:
if tool_name == "search_web":
# Replace with a real search API call
return f"[Search result for '{tool_input['query']}': Sample result returned.]"
if tool_name == "calculate":
try:
result = eval(tool_input["expression"], {"__builtins__": {}})
return str(result)
except Exception as e:
return f"Error: {e}"
return "Unknown tool"
def react_agent(goal: str, max_iterations: int = 10) -> str:
messages = [{"role": "user", "content": goal}]
for _ in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages,
)
# Append assistant turn
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
# Extract final text answer
for block in response.content:
if hasattr(block, "text"):
return block.text
return ""
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = handle_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
return "Max iterations reached."
if __name__ == "__main__":
answer = react_agent("What is 15% of the population of Tokyo as of 2024?")
print(answer)
The loop is the key: the model keeps calling tools until it has enough context to emit a final answer. stop_reason == "end_turn" is the exit signal.
The model generates an output, then evaluates it against a rubric — either in the same model call or a separate one — and revises if the evaluation fails. This is sometimes called "self-refinement" or "critic-actor."
Reflection trades latency for quality. It's expensive in tokens but produces significantly better outputs for writing, code, and structured data extraction.
References:
import anthropic
client = anthropic.Anthropic()
def generate(task: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": task}],
)
return response.content[0].text
def critique(draft: str, rubric: str) -> dict:
prompt = f"""You are a strict evaluator. Given the following draft and rubric,
return a JSON object with two keys:
- "pass": true if the draft meets all rubric criteria, false otherwise
- "feedback": specific actionable feedback if it fails, empty string if it passes
Rubric:
{rubric}
Draft:
{draft}
Return only the JSON object."""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=512,
messages=[{"role": "user", "content": prompt}],
)
import json
return json.loads(response.content[0].text)
def reflection_agent(task: str, rubric: str, max_rounds: int = 3) -> str:
draft = generate(task)
for round_num in range(max_rounds):
evaluation = critique(draft, rubric)
print(f"Round {round_num + 1}: {'PASS' if evaluation['pass'] else 'FAIL'}")
if evaluation["pass"]:
return draft
# Revise the draft using the critic's feedback
revision_prompt = f"""Your previous draft failed the following review:
{evaluation['feedback']}
Original task: {task}
Previous draft:
{draft}
Please rewrite it addressing all the feedback."""
draft = generate(revision_prompt)
return draft # Return best effort after max rounds
if __name__ == "__main__":
rubric = """
- Fewer than 100 words
- Must include a concrete metric or statistic
- No passive voice
- Must end with a call to action
"""
result = reflection_agent(
task="Write a product announcement for an AI web scraping tool.",
rubric=rubric,
)
print(result)
Using a faster, cheaper model (Haiku) as the critic and a more capable model (Sonnet) as the author is an efficient split — it keeps costs reasonable while maintaining output quality.
This is the mechanism underlying ReAct, but it's worth treating as a standalone pattern because the tool design itself is where most agents fail. Poorly named tools, ambiguous descriptions, or overlapping capabilities cause the model to hallucinate tool calls or pick the wrong one.
References:
from dataclasses import dataclass, field
from typing import Callable, Any
import anthropic
@dataclass
class Tool:
name: str
description: str
parameters: dict
handler: Callable[..., Any]
def to_anthropic_schema(self) -> dict:
return {
"name": self.name,
"description": self.description,
"input_schema": self.parameters,
}
class ToolRegistry:
def __init__(self):
self._tools: dict[str, Tool] = {}
def register(self, tool: Tool):
self._tools[tool.name] = tool
def get_schemas(self) -> list[dict]:
return [t.to_anthropic_schema() for t in self._tools.values()]
def call(self, name: str, inputs: dict) -> str:
if name not in self._tools:
return f"Error: unknown tool '{name}'"
try:
result = self._tools[name].handler(**inputs)
return str(result)
except Exception as e:
return f"Error calling {name}: {e}"
# Register domain-specific tools
registry = ToolRegistry()
registry.register(Tool(
name="get_stock_price",
description="Return the current price of a publicly traded stock by ticker symbol.",
parameters={
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "Stock ticker, e.g. AAPL"}
},
"required": ["ticker"],
},
handler=lambda ticker: f"${142.30}", # Replace with real API call
))
registry.register(Tool(
name="get_company_info",
description="Return basic company information including sector, employees, and HQ location.",
parameters={
"type": "object",
"properties": {
"ticker": {"type": "string"}
},
"required": ["ticker"],
},
handler=lambda ticker: f"Apple Inc. — Sector: Technology, Employees: 164,000, HQ: Cupertino CA",
))
def run_agent_with_registry(query: str) -> str:
client = anthropic.Anthropic()
messages = [{"role": "user", "content": query}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=registry.get_schemas(),
messages=messages,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
return next((b.text for b in response.content if hasattr(b, "text")), "")
tool_results = [
{
"type": "tool_result",
"tool_use_id": b.id,
"content": registry.call(b.name, b.input),
}
for b in response.content if b.type == "tool_use"
]
messages.append({"role": "user", "content": tool_results})
if __name__ == "__main__":
print(run_agent_with_registry("Give me the current price and sector for AAPL."))
Instead of deciding the next action one step at a time, the model first produces an explicit multi-step plan, then executes each step. A separate "executor" module handles step execution and feeds results back into a memory buffer.
This pattern is important when a task has too many steps to fit in a single model turn, or when you want human approval before execution begins.
References:
import anthropic
import json
from dataclasses import dataclass
client = anthropic.Anthropic()
@dataclass
class Step:
index: int
description: str
result: str | None = None
def create_plan(goal: str) -> list[Step]:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"""Break this goal into 3-6 concrete, sequential steps.
Return a JSON array of strings, where each string is one step.
Goal: {goal}
Return only the JSON array."""
}],
)
raw = response.content[0].text.strip()
steps_raw = json.loads(raw)
return [Step(index=i, description=s) for i, s in enumerate(steps_raw, 1)]
def execute_step(step: Step, completed_steps: list[Step]) -> str:
context = "\n".join(
f"Step {s.index}: {s.description}\nResult: {s.result}"
for s in completed_steps
if s.result is not None
)
prompt = f"""You are executing step {step.index} of a multi-step plan.
Previous steps and results:
{context or 'None yet.'}
Current step: {step.description}
Produce the output for this step. Be concise."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
)
return response.content[0].text
def synthesize(goal: str, steps: list[Step]) -> str:
steps_summary = "\n".join(
f"Step {s.index} ({s.description}):\n{s.result}" for s in steps
)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"""Given the following goal and completed steps, write a final answer.
Goal: {goal}
Completed steps:
{steps_summary}
Final answer:"""
}],
)
return response.content[0].text
def plan_and_execute(goal: str) -> str:
print(f"Planning: {goal}")
steps = create_plan(goal)
print(f"Generated {len(steps)} steps")
for step in steps:
print(f" Executing step {step.index}: {step.description}")
step.result = execute_step(step, steps[:step.index - 1])
return synthesize(goal, steps)
if __name__ == "__main__":
result = plan_and_execute(
"Write a competitive analysis comparing Firecrawl and Apify for enterprise web data extraction."
)
print(result)
When a task exceeds the context window or benefits from parallel execution, split it across specialized sub-agents. An orchestrator decomposes the goal, fans out work to worker agents, collects results, and synthesizes a final output.
This is what AWS calls the "supervisor" pattern and what Microsoft AutoGen calls "GroupChat" or "nested chat." Anthropic recommends it explicitly for tasks that can be broken into parallel, independent subtasks.
References:
import anthropic
from concurrent.futures import ThreadPoolExecutor, as_completed
client = anthropic.Anthropic()
def specialist_agent(role: str, task: str) -> str:
"""A single specialist agent that handles one focused task."""
system = f"You are a {role}. Answer concisely and precisely. Focus only on your area."
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
system=system,
messages=[{"role": "user", "content": task}],
)
return response.content[0].text
def orchestrator_agent(goal: str) -> str:
"""
Decomposes a goal into parallel sub-tasks, fans them out to specialist
agents, then synthesizes all results into a final answer.
"""
# Step 1: Decompose
decompose_prompt = f"""Break this goal into 3-5 parallel sub-tasks that can be executed independently.
Each sub-task should have:
- role: the type of expert best suited for it
- task: the specific question or output needed
Return a JSON array of objects with "role" and "task" keys.
Goal: {goal}"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": decompose_prompt}],
)
import json
sub_tasks = json.loads(response.content[0].text)
# Step 2: Fan out in parallel
results = {}
with ThreadPoolExecutor(max_workers=len(sub_tasks)) as executor:
futures = {
executor.submit(specialist_agent, st["role"], st["task"]): st
for st in sub_tasks
}
for future in as_completed(futures):
st = futures[future]
results[st["role"]] = future.result()
# Step 3: Synthesize
synthesis_parts = "\n\n".join(
f"### {role}\n{result}" for role, result in results.items()
)
synthesis_prompt = f"""You are synthesizing the outputs of multiple specialist agents into a final report.
Goal: {goal}
Specialist outputs:
{synthesis_parts}
Write a cohesive final answer that integrates all perspectives."""
synthesis = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": synthesis_prompt}],
)
return synthesis.content[0].text
if __name__ == "__main__":
result = orchestrator_agent(
"Evaluate whether an early-stage SaaS company should prioritize SEO or paid ads for growth."
)
print(result)
Agents are stateless by default — each API call starts from scratch. Memory patterns add persistence. There are three layers:
| Layer | What it stores | Implementation |
|---|
| In-context | Current conversation history | The messages list |
| External short-term | Recent sessions, working memory | Redis, database |
| External long-term | User preferences, knowledge | Vector DB (embeddings) |
References:
import anthropic
from dataclasses import dataclass, field
client = anthropic.Anthropic()
@dataclass
class AgentMemory:
summary: str = ""
recent_messages: list[dict] = field(default_factory=list)
max_recent: int = 10
def add_turn(self, role: str, content: str):
self.recent_messages.append({"role": role, "content": content})
if len(self.recent_messages) > self.max_recent:
self._compress()
def _compress(self):
to_compress = self.recent_messages[:-4] # Keep last 4 messages fresh
compress_prompt = f"""Summarize this conversation history. Preserve key facts,
decisions made, and any information the user mentioned about themselves or their goals.
Existing summary: {self.summary or 'None'}
New messages to incorporate:
{chr(10).join(f"{m['role']}: {m['content']}" for m in to_compress)}
Updated summary:"""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=512,
messages=[{"role": "user", "content": compress_prompt}],
)
self.summary = response.content[0].text
self.recent_messages = self.recent_messages[-4:]
def build_messages(self, new_user_message: str) -> list[dict]:
messages = []
if self.summary:
messages.append({
"role": "user",
"content": f"[Conversation summary so far: {self.summary}]"
})
messages.append({
"role": "assistant",
"content": "Understood. I'll continue with that context in mind."
})
messages.extend(self.recent_messages)
messages.append({"role": "user", "content": new_user_message})
return messages
def stateful_agent(memory: AgentMemory, user_message: str) -> str:
messages = memory.build_messages(user_message)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a helpful assistant with memory of previous conversations.",
messages=messages,
)
reply = response.content[0].text
memory.add_turn("user", user_message)
memory.add_turn("assistant", reply)
return reply
if __name__ == "__main__":
memory = AgentMemory()
turns = [
"My name is Priya and I'm building a B2B SaaS for logistics.",
"What pricing models work well for logistics software?",
"We're targeting mid-market companies with 100-500 employees.",
"Given what you know about me, what's the right pricing tier structure?",
]
for turn in turns:
print(f"User: {turn}")
reply = stateful_agent(memory, turn)
print(f"Agent: {reply}\n")
When you have multiple specialized agents or pipelines, a router classifies the incoming request and dispatches it to the right handler. This avoids giving every specialist all tools (which dilutes their focus) and keeps system prompts short and targeted.
References:
import anthropic
import json
from typing import Callable
client = anthropic.Anthropic()
# Define specialist handlers
def handle_code(query: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system="You are an expert software engineer. Provide working code with brief explanations.",
messages=[{"role": "user", "content": query}],
)
return response.content[0].text
def handle_data_analysis(query: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system="You are a data analyst. Provide structured analysis with numbers, trends, and actionable conclusions.",
messages=[{"role": "user", "content": query}],
)
return response.content[0].text
def handle_writing(query: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system="You are a professional writer and editor. Produce clear, engaging prose.",
messages=[{"role": "user", "content": query}],
)
return response.content[0].text
def handle_general(query: str) -> str:
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[{"role": "user", "content": query}],
)
return response.content[0].text
ROUTES: dict[str, Callable[[str], str]] = {
"code": handle_code,
"data_analysis": handle_data_analysis,
"writing": handle_writing,
"general": handle_general,
}
def route(query: str) -> str:
router_prompt = f"""Classify the following user query into exactly one of these categories:
- code: programming, debugging, software architecture
- data_analysis: statistics, trends, metrics, business intelligence
- writing: content creation, editing, summarization
- general: anything else
Return a JSON object with a single key "category".
Query: {query}"""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=64,
messages=[{"role": "user", "content": router_prompt}],
)
result = json.loads(response.content[0].text)
category = result.get("category", "general")
if category not in ROUTES:
category = "general"
print(f"Routing to: {category}")
return ROUTES[category](query)
if __name__ == "__main__":
queries = [
"Write a Python function to find all prime numbers up to n.",
"Analyze the year-over-year growth trend from these numbers: 120, 145, 190, 240, 310.",
"Rewrite this sentence to be more concise: 'Due to the fact that we were unable to...'",
"What is the capital of Peru?",
]
for q in queries:
print(f"\nQuery: {q}")
print(f"Answer: {route(q)}")
Real systems combine these patterns. A common architecture for a production research agent:
User Request
│
▼
[Router] ──► Simple query? ──► [ReAct Agent with tools]
│
└──► Complex research? ──► [Planner]
│
▼
[Orchestrator]
/ | \
[Web Agent] [DB Agent] [Analysis Agent]
\ | /
▼ ▼ ▼
[Reflection Layer]
│
▼
[Memory Store]
│
▼
Final Response
Each pattern solves a specific problem. Add them only when you hit the wall their pattern addresses — adding all of them up front creates unnecessary complexity.
Agentic systems fail in ways that pure LLM calls don't. Common failure modes and how to handle them:
Infinite loops: Set a hard max_iterations limit. Log each iteration to detect cycles.
Tool hallucinations: Return structured errors from tool handlers. Validate tool input against the schema before executing.
Context window overflow: Use the Memory pattern's summary compression. For long pipelines, prune intermediate results aggressively.
Cascading failures in multi-agent: Use timeouts on each sub-agent call. Return partial results rather than failing the entire pipeline.
import signal
from contextlib import contextmanager
@contextmanager
def time_limit(seconds: int):
def handler(signum, frame):
raise TimeoutError(f"Agent timed out after {seconds}s")
signal.signal(signal.SIGALRM, handler)
signal.alarm(seconds)
try:
yield
finally:
signal.alarm(0)
# Usage
try:
with time_limit(30):
result = react_agent("Complex research task...")
except TimeoutError:
result = "Agent timed out. Partial results available."
Microsoft:
AWS:
Anthropic:
Research:
| Pattern | Token cost | Latency | Use when |
|---|
| ReAct | Low–Medium | Low | Default starting point |
| Reflection | High | High | Quality-critical outputs |
| Tool Use | Depends on tools | Low + tool latency | Any external data need |
| Planning | Medium | Medium | Long-horizon tasks |
| Multi-Agent | High | Low (parallel) | Large tasks, specialization |
| Memory | Low (compression) | Negligible | Stateful sessions |
| Routing | Very low | Negligible | Mixed workloads |
Start with ReAct + Tool Use. Add Reflection when output quality is the bottleneck. Add Planning when tasks exceed 5-6 tool calls. Add Multi-Agent when tasks are naturally parallelizable. Add Memory when sessions span multiple turns. Add Routing when you have clearly distinct workloads.
Every pattern you add is a complexity tax. Pay it only when you've hit the specific wall it solves.