Agentic Design Patterns: Building Reliable AI Systems with Python

A comprehensive guide to the core design patterns behind agentic AI systems — from simple ReAct loops to multi-agent orchestration — with working Python examples and references from Microsoft, AWS, and Anthropic.

Rahul Bisht

Founder, CrawlPilot

·
Jun 21, 2026
·Engineering·
16 min read
·
Agentic Design Patterns: Building Reliable AI Systems with Python

Agentic AI is moving fast, but the underlying design patterns are stabilizing. Microsoft's AutoGen team, AWS's multi-agent blueprints, Anthropic's agent research, and LangGraph's graph-based orchestration have converged on a small set of composable patterns.

This guide distills those patterns, explains when each applies, and shows working Python implementations — not pseudocode.


What Makes a System "Agentic"?

A system is agentic when the model controls the sequence of steps rather than a hardcoded pipeline. The model reads a goal, decides which tools to call, observes the results, and loops until the goal is satisfied or a stopping condition is hit.

That's the minimum bar. Beyond it, systems add memory, parallelism, specialized sub-agents, and self-correction. Each addition introduces a design choice — and a corresponding pattern.


The Core Patterns

PatternWhat it solvesWhen to use
ReActInterleaving reasoning with tool useSingle-task agents with tool access
ReflectionSelf-evaluation and revisionOutput quality matters more than speed
Tool UseExtending model capabilityAny task beyond text generation
PlanningBreaking goals into sub-stepsLong-horizon tasks
Multi-AgentParallelism + specializationTasks too large for one context window
MemoryPersistence across turnsStateful, long-running sessions
RoutingDynamic task dispatchMixed workloads with different specialists

Pattern 1: ReAct (Reasoning + Acting)

ReAct is the foundational pattern. The model alternates between Thought (reasoning about the current state), Action (calling a tool), and Observation (reading the result). This loop continues until the model emits a final answer.

It was formalized in the 2022 paper "ReAct: Synergizing Reasoning and Acting in Language Models" and is the basis for most agent frameworks today.

References:

Python Example

python
import anthropic import json client = anthropic.Anthropic() tools = [ { "name": "search_web", "description": "Search the web and return a summary of top results.", "input_schema": { "type": "object", "properties": { "query": {"type": "string", "description": "The search query"} }, "required": ["query"], }, }, { "name": "calculate", "description": "Evaluate a mathematical expression and return the result.", "input_schema": { "type": "object", "properties": { "expression": {"type": "string", "description": "Math expression, e.g. '42 * 1.08'"} }, "required": ["expression"], }, }, ] def handle_tool_call(tool_name: str, tool_input: dict) -> str: if tool_name == "search_web": # Replace with a real search API call return f"[Search result for '{tool_input['query']}': Sample result returned.]" if tool_name == "calculate": try: result = eval(tool_input["expression"], {"__builtins__": {}}) return str(result) except Exception as e: return f"Error: {e}" return "Unknown tool" def react_agent(goal: str, max_iterations: int = 10) -> str: messages = [{"role": "user", "content": goal}] for _ in range(max_iterations): response = client.messages.create( model="claude-sonnet-4-6", max_tokens=4096, tools=tools, messages=messages, ) # Append assistant turn messages.append({"role": "assistant", "content": response.content}) if response.stop_reason == "end_turn": # Extract final text answer for block in response.content: if hasattr(block, "text"): return block.text return "" if response.stop_reason == "tool_use": tool_results = [] for block in response.content: if block.type == "tool_use": result = handle_tool_call(block.name, block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": result, }) messages.append({"role": "user", "content": tool_results}) return "Max iterations reached." if __name__ == "__main__": answer = react_agent("What is 15% of the population of Tokyo as of 2024?") print(answer)

The loop is the key: the model keeps calling tools until it has enough context to emit a final answer. stop_reason == "end_turn" is the exit signal.


Pattern 2: Reflection

The model generates an output, then evaluates it against a rubric — either in the same model call or a separate one — and revises if the evaluation fails. This is sometimes called "self-refinement" or "critic-actor."

Reflection trades latency for quality. It's expensive in tokens but produces significantly better outputs for writing, code, and structured data extraction.

References:

Python Example

python
import anthropic client = anthropic.Anthropic() def generate(task: str) -> str: response = client.messages.create( model="claude-sonnet-4-6", max_tokens=2048, messages=[{"role": "user", "content": task}], ) return response.content[0].text def critique(draft: str, rubric: str) -> dict: prompt = f"""You are a strict evaluator. Given the following draft and rubric, return a JSON object with two keys: - "pass": true if the draft meets all rubric criteria, false otherwise - "feedback": specific actionable feedback if it fails, empty string if it passes Rubric: {rubric} Draft: {draft} Return only the JSON object.""" response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=512, messages=[{"role": "user", "content": prompt}], ) import json return json.loads(response.content[0].text) def reflection_agent(task: str, rubric: str, max_rounds: int = 3) -> str: draft = generate(task) for round_num in range(max_rounds): evaluation = critique(draft, rubric) print(f"Round {round_num + 1}: {'PASS' if evaluation['pass'] else 'FAIL'}") if evaluation["pass"]: return draft # Revise the draft using the critic's feedback revision_prompt = f"""Your previous draft failed the following review: {evaluation['feedback']} Original task: {task} Previous draft: {draft} Please rewrite it addressing all the feedback.""" draft = generate(revision_prompt) return draft # Return best effort after max rounds if __name__ == "__main__": rubric = """ - Fewer than 100 words - Must include a concrete metric or statistic - No passive voice - Must end with a call to action """ result = reflection_agent( task="Write a product announcement for an AI web scraping tool.", rubric=rubric, ) print(result)

Using a faster, cheaper model (Haiku) as the critic and a more capable model (Sonnet) as the author is an efficient split — it keeps costs reasonable while maintaining output quality.


Pattern 3: Tool Use (Function Calling)

This is the mechanism underlying ReAct, but it's worth treating as a standalone pattern because the tool design itself is where most agents fail. Poorly named tools, ambiguous descriptions, or overlapping capabilities cause the model to hallucinate tool calls or pick the wrong one.

References:

Python Example — Structured Tool Registry

python
from dataclasses import dataclass, field from typing import Callable, Any import anthropic @dataclass class Tool: name: str description: str parameters: dict handler: Callable[..., Any] def to_anthropic_schema(self) -> dict: return { "name": self.name, "description": self.description, "input_schema": self.parameters, } class ToolRegistry: def __init__(self): self._tools: dict[str, Tool] = {} def register(self, tool: Tool): self._tools[tool.name] = tool def get_schemas(self) -> list[dict]: return [t.to_anthropic_schema() for t in self._tools.values()] def call(self, name: str, inputs: dict) -> str: if name not in self._tools: return f"Error: unknown tool '{name}'" try: result = self._tools[name].handler(**inputs) return str(result) except Exception as e: return f"Error calling {name}: {e}" # Register domain-specific tools registry = ToolRegistry() registry.register(Tool( name="get_stock_price", description="Return the current price of a publicly traded stock by ticker symbol.", parameters={ "type": "object", "properties": { "ticker": {"type": "string", "description": "Stock ticker, e.g. AAPL"} }, "required": ["ticker"], }, handler=lambda ticker: f"${142.30}", # Replace with real API call )) registry.register(Tool( name="get_company_info", description="Return basic company information including sector, employees, and HQ location.", parameters={ "type": "object", "properties": { "ticker": {"type": "string"} }, "required": ["ticker"], }, handler=lambda ticker: f"Apple Inc. — Sector: Technology, Employees: 164,000, HQ: Cupertino CA", )) def run_agent_with_registry(query: str) -> str: client = anthropic.Anthropic() messages = [{"role": "user", "content": query}] while True: response = client.messages.create( model="claude-sonnet-4-6", max_tokens=2048, tools=registry.get_schemas(), messages=messages, ) messages.append({"role": "assistant", "content": response.content}) if response.stop_reason == "end_turn": return next((b.text for b in response.content if hasattr(b, "text")), "") tool_results = [ { "type": "tool_result", "tool_use_id": b.id, "content": registry.call(b.name, b.input), } for b in response.content if b.type == "tool_use" ] messages.append({"role": "user", "content": tool_results}) if __name__ == "__main__": print(run_agent_with_registry("Give me the current price and sector for AAPL."))

Pattern 4: Planning (Plan-and-Execute)

Instead of deciding the next action one step at a time, the model first produces an explicit multi-step plan, then executes each step. A separate "executor" module handles step execution and feeds results back into a memory buffer.

This pattern is important when a task has too many steps to fit in a single model turn, or when you want human approval before execution begins.

References:

Python Example

python
import anthropic import json from dataclasses import dataclass client = anthropic.Anthropic() @dataclass class Step: index: int description: str result: str | None = None def create_plan(goal: str) -> list[Step]: response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=[{ "role": "user", "content": f"""Break this goal into 3-6 concrete, sequential steps. Return a JSON array of strings, where each string is one step. Goal: {goal} Return only the JSON array.""" }], ) raw = response.content[0].text.strip() steps_raw = json.loads(raw) return [Step(index=i, description=s) for i, s in enumerate(steps_raw, 1)] def execute_step(step: Step, completed_steps: list[Step]) -> str: context = "\n".join( f"Step {s.index}: {s.description}\nResult: {s.result}" for s in completed_steps if s.result is not None ) prompt = f"""You are executing step {step.index} of a multi-step plan. Previous steps and results: {context or 'None yet.'} Current step: {step.description} Produce the output for this step. Be concise.""" response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role": "user", "content": prompt}], ) return response.content[0].text def synthesize(goal: str, steps: list[Step]) -> str: steps_summary = "\n".join( f"Step {s.index} ({s.description}):\n{s.result}" for s in steps ) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=2048, messages=[{ "role": "user", "content": f"""Given the following goal and completed steps, write a final answer. Goal: {goal} Completed steps: {steps_summary} Final answer:""" }], ) return response.content[0].text def plan_and_execute(goal: str) -> str: print(f"Planning: {goal}") steps = create_plan(goal) print(f"Generated {len(steps)} steps") for step in steps: print(f" Executing step {step.index}: {step.description}") step.result = execute_step(step, steps[:step.index - 1]) return synthesize(goal, steps) if __name__ == "__main__": result = plan_and_execute( "Write a competitive analysis comparing Firecrawl and Apify for enterprise web data extraction." ) print(result)

Pattern 5: Multi-Agent Orchestration

When a task exceeds the context window or benefits from parallel execution, split it across specialized sub-agents. An orchestrator decomposes the goal, fans out work to worker agents, collects results, and synthesizes a final output.

This is what AWS calls the "supervisor" pattern and what Microsoft AutoGen calls "GroupChat" or "nested chat." Anthropic recommends it explicitly for tasks that can be broken into parallel, independent subtasks.

References:

Python Example — Parallel Sub-Agents with ThreadPoolExecutor

python
import anthropic from concurrent.futures import ThreadPoolExecutor, as_completed client = anthropic.Anthropic() def specialist_agent(role: str, task: str) -> str: """A single specialist agent that handles one focused task.""" system = f"You are a {role}. Answer concisely and precisely. Focus only on your area." response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=1024, system=system, messages=[{"role": "user", "content": task}], ) return response.content[0].text def orchestrator_agent(goal: str) -> str: """ Decomposes a goal into parallel sub-tasks, fans them out to specialist agents, then synthesizes all results into a final answer. """ # Step 1: Decompose decompose_prompt = f"""Break this goal into 3-5 parallel sub-tasks that can be executed independently. Each sub-task should have: - role: the type of expert best suited for it - task: the specific question or output needed Return a JSON array of objects with "role" and "task" keys. Goal: {goal}""" response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role": "user", "content": decompose_prompt}], ) import json sub_tasks = json.loads(response.content[0].text) # Step 2: Fan out in parallel results = {} with ThreadPoolExecutor(max_workers=len(sub_tasks)) as executor: futures = { executor.submit(specialist_agent, st["role"], st["task"]): st for st in sub_tasks } for future in as_completed(futures): st = futures[future] results[st["role"]] = future.result() # Step 3: Synthesize synthesis_parts = "\n\n".join( f"### {role}\n{result}" for role, result in results.items() ) synthesis_prompt = f"""You are synthesizing the outputs of multiple specialist agents into a final report. Goal: {goal} Specialist outputs: {synthesis_parts} Write a cohesive final answer that integrates all perspectives.""" synthesis = client.messages.create( model="claude-sonnet-4-6", max_tokens=2048, messages=[{"role": "user", "content": synthesis_prompt}], ) return synthesis.content[0].text if __name__ == "__main__": result = orchestrator_agent( "Evaluate whether an early-stage SaaS company should prioritize SEO or paid ads for growth." ) print(result)

Pattern 6: Memory

Agents are stateless by default — each API call starts from scratch. Memory patterns add persistence. There are three layers:

LayerWhat it storesImplementation
In-contextCurrent conversation historyThe messages list
External short-termRecent sessions, working memoryRedis, database
External long-termUser preferences, knowledgeVector DB (embeddings)

References:

Python Example — External Memory with Summary Compression

python
import anthropic from dataclasses import dataclass, field client = anthropic.Anthropic() @dataclass class AgentMemory: summary: str = "" recent_messages: list[dict] = field(default_factory=list) max_recent: int = 10 def add_turn(self, role: str, content: str): self.recent_messages.append({"role": role, "content": content}) if len(self.recent_messages) > self.max_recent: self._compress() def _compress(self): to_compress = self.recent_messages[:-4] # Keep last 4 messages fresh compress_prompt = f"""Summarize this conversation history. Preserve key facts, decisions made, and any information the user mentioned about themselves or their goals. Existing summary: {self.summary or 'None'} New messages to incorporate: {chr(10).join(f"{m['role']}: {m['content']}" for m in to_compress)} Updated summary:""" response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=512, messages=[{"role": "user", "content": compress_prompt}], ) self.summary = response.content[0].text self.recent_messages = self.recent_messages[-4:] def build_messages(self, new_user_message: str) -> list[dict]: messages = [] if self.summary: messages.append({ "role": "user", "content": f"[Conversation summary so far: {self.summary}]" }) messages.append({ "role": "assistant", "content": "Understood. I'll continue with that context in mind." }) messages.extend(self.recent_messages) messages.append({"role": "user", "content": new_user_message}) return messages def stateful_agent(memory: AgentMemory, user_message: str) -> str: messages = memory.build_messages(user_message) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, system="You are a helpful assistant with memory of previous conversations.", messages=messages, ) reply = response.content[0].text memory.add_turn("user", user_message) memory.add_turn("assistant", reply) return reply if __name__ == "__main__": memory = AgentMemory() turns = [ "My name is Priya and I'm building a B2B SaaS for logistics.", "What pricing models work well for logistics software?", "We're targeting mid-market companies with 100-500 employees.", "Given what you know about me, what's the right pricing tier structure?", ] for turn in turns: print(f"User: {turn}") reply = stateful_agent(memory, turn) print(f"Agent: {reply}\n")

Pattern 7: Routing

When you have multiple specialized agents or pipelines, a router classifies the incoming request and dispatches it to the right handler. This avoids giving every specialist all tools (which dilutes their focus) and keeps system prompts short and targeted.

References:

Python Example

python
import anthropic import json from typing import Callable client = anthropic.Anthropic() # Define specialist handlers def handle_code(query: str) -> str: response = client.messages.create( model="claude-sonnet-4-6", max_tokens=2048, system="You are an expert software engineer. Provide working code with brief explanations.", messages=[{"role": "user", "content": query}], ) return response.content[0].text def handle_data_analysis(query: str) -> str: response = client.messages.create( model="claude-sonnet-4-6", max_tokens=2048, system="You are a data analyst. Provide structured analysis with numbers, trends, and actionable conclusions.", messages=[{"role": "user", "content": query}], ) return response.content[0].text def handle_writing(query: str) -> str: response = client.messages.create( model="claude-sonnet-4-6", max_tokens=2048, system="You are a professional writer and editor. Produce clear, engaging prose.", messages=[{"role": "user", "content": query}], ) return response.content[0].text def handle_general(query: str) -> str: response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=1024, messages=[{"role": "user", "content": query}], ) return response.content[0].text ROUTES: dict[str, Callable[[str], str]] = { "code": handle_code, "data_analysis": handle_data_analysis, "writing": handle_writing, "general": handle_general, } def route(query: str) -> str: router_prompt = f"""Classify the following user query into exactly one of these categories: - code: programming, debugging, software architecture - data_analysis: statistics, trends, metrics, business intelligence - writing: content creation, editing, summarization - general: anything else Return a JSON object with a single key "category". Query: {query}""" response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=64, messages=[{"role": "user", "content": router_prompt}], ) result = json.loads(response.content[0].text) category = result.get("category", "general") if category not in ROUTES: category = "general" print(f"Routing to: {category}") return ROUTES[category](query) if __name__ == "__main__": queries = [ "Write a Python function to find all prime numbers up to n.", "Analyze the year-over-year growth trend from these numbers: 120, 145, 190, 240, 310.", "Rewrite this sentence to be more concise: 'Due to the fact that we were unable to...'", "What is the capital of Peru?", ] for q in queries: print(f"\nQuery: {q}") print(f"Answer: {route(q)}")

Combining Patterns

Real systems combine these patterns. A common architecture for a production research agent:

User Request
     │
     ▼
 [Router] ──► Simple query? ──► [ReAct Agent with tools]
     │
     └──► Complex research? ──► [Planner]
                                    │
                                    ▼
                            [Orchestrator]
                           /      |       \
                    [Web Agent] [DB Agent] [Analysis Agent]
                           \      |       /
                            ▼     ▼     ▼
                          [Reflection Layer]
                                  │
                                  ▼
                            [Memory Store]
                                  │
                                  ▼
                            Final Response

Each pattern solves a specific problem. Add them only when you hit the wall their pattern addresses — adding all of them up front creates unnecessary complexity.


Guardrails and Reliability

Agentic systems fail in ways that pure LLM calls don't. Common failure modes and how to handle them:

Infinite loops: Set a hard max_iterations limit. Log each iteration to detect cycles.

Tool hallucinations: Return structured errors from tool handlers. Validate tool input against the schema before executing.

Context window overflow: Use the Memory pattern's summary compression. For long pipelines, prune intermediate results aggressively.

Cascading failures in multi-agent: Use timeouts on each sub-agent call. Return partial results rather than failing the entire pipeline.

python
import signal from contextlib import contextmanager @contextmanager def time_limit(seconds: int): def handler(signum, frame): raise TimeoutError(f"Agent timed out after {seconds}s") signal.signal(signal.SIGALRM, handler) signal.alarm(seconds) try: yield finally: signal.alarm(0) # Usage try: with time_limit(30): result = react_agent("Complex research task...") except TimeoutError: result = "Agent timed out. Partial results available."

Further Reading

Microsoft:

AWS:

Anthropic:

Research:


Summary

PatternToken costLatencyUse when
ReActLow–MediumLowDefault starting point
ReflectionHighHighQuality-critical outputs
Tool UseDepends on toolsLow + tool latencyAny external data need
PlanningMediumMediumLong-horizon tasks
Multi-AgentHighLow (parallel)Large tasks, specialization
MemoryLow (compression)NegligibleStateful sessions
RoutingVery lowNegligibleMixed workloads

Start with ReAct + Tool Use. Add Reflection when output quality is the bottleneck. Add Planning when tasks exceed 5-6 tool calls. Add Multi-Agent when tasks are naturally parallelizable. Add Memory when sessions span multiple turns. Add Routing when you have clearly distinct workloads.

Every pattern you add is a complexity tax. Pay it only when you've hit the specific wall it solves.