Agentic Design Patterns: Building Reliable AI Systems with Python

Agentic AI is moving fast, but the underlying design patterns are stabilizing. Microsoft's AutoGen team, AWS's multi-agent blueprints, Anthropic's agent research, and LangGraph's graph-based orchestration have converged on a small set of composable patterns.

This guide distills those patterns, explains when each applies, and shows working Python implementations — not pseudocode.

What Makes a System "Agentic"?

A system is agentic when the model controls the sequence of steps rather than a hardcoded pipeline. The model reads a goal, decides which tools to call, observes the results, and loops until the goal is satisfied or a stopping condition is hit.

That's the minimum bar. Beyond it, systems add memory, parallelism, specialized sub-agents, and self-correction. Each addition introduces a design choice — and a corresponding pattern.

The Core Patterns

Pattern	What it solves	When to use
ReAct	Interleaving reasoning with tool use	Single-task agents with tool access
Reflection	Self-evaluation and revision	Output quality matters more than speed
Tool Use	Extending model capability	Any task beyond text generation
Planning	Breaking goals into sub-steps	Long-horizon tasks
Multi-Agent	Parallelism + specialization	Tasks too large for one context window
Memory	Persistence across turns	Stateful, long-running sessions
Routing	Dynamic task dispatch	Mixed workloads with different specialists

Pattern 1: ReAct (Reasoning + Acting)

ReAct is the foundational pattern. The model alternates between Thought (reasoning about the current state), Action (calling a tool), and Observation (reading the result). This loop continues until the model emits a final answer.

It was formalized in the 2022 paper "ReAct: Synergizing Reasoning and Acting in Language Models" and is the basis for most agent frameworks today.

References:

Python Example

python
import anthropic
import json

client = anthropic.Anthropic()

tools = [
    {
        "name": "search_web",
        "description": "Search the web and return a summary of top results.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query"}
            },
            "required": ["query"],
        },
    },
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression and return the result.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression, e.g. '42 * 1.08'"}
            },
            "required": ["expression"],
        },
    },
]

def handle_tool_call(tool_name: str, tool_input: dict) -> str:
    if tool_name == "search_web":
        # Replace with a real search API call
        return f"[Search result for '{tool_input['query']}': Sample result returned.]"
    if tool_name == "calculate":
        try:
            result = eval(tool_input["expression"], {"__builtins__": {}})
            return str(result)
        except Exception as e:
            return f"Error: {e}"
    return "Unknown tool"

def react_agent(goal: str, max_iterations: int = 10) -> str:
    messages = [{"role": "user", "content": goal}]

    for _ in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )

        # Append assistant turn
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # Extract final text answer
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return ""

        if response.stop_reason == "tool_use":
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = handle_tool_call(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })
            messages.append({"role": "user", "content": tool_results})

    return "Max iterations reached."

if __name__ == "__main__":
    answer = react_agent("What is 15% of the population of Tokyo as of 2024?")
    print(answer)

The loop is the key: the model keeps calling tools until it has enough context to emit a final answer. stop_reason == "end_turn" is the exit signal.

Pattern 2: Reflection

The model generates an output, then evaluates it against a rubric — either in the same model call or a separate one — and revises if the evaluation fails. This is sometimes called "self-refinement" or "critic-actor."

Reflection trades latency for quality. It's expensive in tokens but produces significantly better outputs for writing, code, and structured data extraction.

References:

Python Example

python
import anthropic

client = anthropic.Anthropic()

def generate(task: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{"role": "user", "content": task}],
    )
    return response.content[0].text

def critique(draft: str, rubric: str) -> dict:
    prompt = f"""You are a strict evaluator. Given the following draft and rubric, 
return a JSON object with two keys:
- "pass": true if the draft meets all rubric criteria, false otherwise
- "feedback": specific actionable feedback if it fails, empty string if it passes

Rubric:
{rubric}

Draft:
{draft}

Return only the JSON object."""

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    import json
    return json.loads(response.content[0].text)

def reflection_agent(task: str, rubric: str, max_rounds: int = 3) -> str:
    draft = generate(task)

    for round_num in range(max_rounds):
        evaluation = critique(draft, rubric)
        print(f"Round {round_num + 1}: {'PASS' if evaluation['pass'] else 'FAIL'}")

        if evaluation["pass"]:
            return draft

        # Revise the draft using the critic's feedback
        revision_prompt = f"""Your previous draft failed the following review:

{evaluation['feedback']}

Original task: {task}

Previous draft:
{draft}

Please rewrite it addressing all the feedback."""

        draft = generate(revision_prompt)

    return draft  # Return best effort after max rounds

if __name__ == "__main__":
    rubric = """
    - Fewer than 100 words
    - Must include a concrete metric or statistic
    - No passive voice
    - Must end with a call to action
    """
    result = reflection_agent(
        task="Write a product announcement for an AI web scraping tool.",
        rubric=rubric,
    )
    print(result)

Using a faster, cheaper model (Haiku) as the critic and a more capable model (Sonnet) as the author is an efficient split — it keeps costs reasonable while maintaining output quality.

Pattern 3: Tool Use (Function Calling)

This is the mechanism underlying ReAct, but it's worth treating as a standalone pattern because the tool design itself is where most agents fail. Poorly named tools, ambiguous descriptions, or overlapping capabilities cause the model to hallucinate tool calls or pick the wrong one.

References:

Python Example — Structured Tool Registry

python
from dataclasses import dataclass, field
from typing import Callable, Any
import anthropic

@dataclass
class Tool:
    name: str
    description: str
    parameters: dict
    handler: Callable[..., Any]

    def to_anthropic_schema(self) -> dict:
        return {
            "name": self.name,
            "description": self.description,
            "input_schema": self.parameters,
        }

class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, Tool] = {}

    def register(self, tool: Tool):
        self._tools[tool.name] = tool

    def get_schemas(self) -> list[dict]:
        return [t.to_anthropic_schema() for t in self._tools.values()]

    def call(self, name: str, inputs: dict) -> str:
        if name not in self._tools:
            return f"Error: unknown tool '{name}'"
        try:
            result = self._tools[name].handler(**inputs)
            return str(result)
        except Exception as e:
            return f"Error calling {name}: {e}"

# Register domain-specific tools
registry = ToolRegistry()

registry.register(Tool(
    name="get_stock_price",
    description="Return the current price of a publicly traded stock by ticker symbol.",
    parameters={
        "type": "object",
        "properties": {
            "ticker": {"type": "string", "description": "Stock ticker, e.g. AAPL"}
        },
        "required": ["ticker"],
    },
    handler=lambda ticker: f"${142.30}",  # Replace with real API call
))

registry.register(Tool(
    name="get_company_info",
    description="Return basic company information including sector, employees, and HQ location.",
    parameters={
        "type": "object",
        "properties": {
            "ticker": {"type": "string"}
        },
        "required": ["ticker"],
    },
    handler=lambda ticker: f"Apple Inc. — Sector: Technology, Employees: 164,000, HQ: Cupertino CA",
))

def run_agent_with_registry(query: str) -> str:
    client = anthropic.Anthropic()
    messages = [{"role": "user", "content": query}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            tools=registry.get_schemas(),
            messages=messages,
        )
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            return next((b.text for b in response.content if hasattr(b, "text")), "")

        tool_results = [
            {
                "type": "tool_result",
                "tool_use_id": b.id,
                "content": registry.call(b.name, b.input),
            }
            for b in response.content if b.type == "tool_use"
        ]
        messages.append({"role": "user", "content": tool_results})

if __name__ == "__main__":
    print(run_agent_with_registry("Give me the current price and sector for AAPL."))

Pattern 4: Planning (Plan-and-Execute)

Instead of deciding the next action one step at a time, the model first produces an explicit multi-step plan, then executes each step. A separate "executor" module handles step execution and feeds results back into a memory buffer.

This pattern is important when a task has too many steps to fit in a single model turn, or when you want human approval before execution begins.

References:

Python Example

python
import anthropic
import json
from dataclasses import dataclass

client = anthropic.Anthropic()

@dataclass
class Step:
    index: int
    description: str
    result: str | None = None

def create_plan(goal: str) -> list[Step]:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""Break this goal into 3-6 concrete, sequential steps.
Return a JSON array of strings, where each string is one step.

Goal: {goal}

Return only the JSON array."""
        }],
    )
    raw = response.content[0].text.strip()
    steps_raw = json.loads(raw)
    return [Step(index=i, description=s) for i, s in enumerate(steps_raw, 1)]

def execute_step(step: Step, completed_steps: list[Step]) -> str:
    context = "\n".join(
        f"Step {s.index}: {s.description}\nResult: {s.result}"
        for s in completed_steps
        if s.result is not None
    )
    prompt = f"""You are executing step {step.index} of a multi-step plan.

Previous steps and results:
{context or 'None yet.'}

Current step: {step.description}

Produce the output for this step. Be concise."""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.content[0].text

def synthesize(goal: str, steps: list[Step]) -> str:
    steps_summary = "\n".join(
        f"Step {s.index} ({s.description}):\n{s.result}" for s in steps
    )
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"""Given the following goal and completed steps, write a final answer.

Goal: {goal}

Completed steps:
{steps_summary}

Final answer:"""
        }],
    )
    return response.content[0].text

def plan_and_execute(goal: str) -> str:
    print(f"Planning: {goal}")
    steps = create_plan(goal)
    print(f"Generated {len(steps)} steps")

    for step in steps:
        print(f"  Executing step {step.index}: {step.description}")
        step.result = execute_step(step, steps[:step.index - 1])

    return synthesize(goal, steps)

if __name__ == "__main__":
    result = plan_and_execute(
        "Write a competitive analysis comparing Firecrawl and Apify for enterprise web data extraction."
    )
    print(result)

Pattern 5: Multi-Agent Orchestration

When a task exceeds the context window or benefits from parallel execution, split it across specialized sub-agents. An orchestrator decomposes the goal, fans out work to worker agents, collects results, and synthesizes a final output.

This is what AWS calls the "supervisor" pattern and what Microsoft AutoGen calls "GroupChat" or "nested chat." Anthropic recommends it explicitly for tasks that can be broken into parallel, independent subtasks.

References:

Python Example — Parallel Sub-Agents with ThreadPoolExecutor

python
import anthropic
from concurrent.futures import ThreadPoolExecutor, as_completed

client = anthropic.Anthropic()

def specialist_agent(role: str, task: str) -> str:
    """A single specialist agent that handles one focused task."""
    system = f"You are a {role}. Answer concisely and precisely. Focus only on your area."
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        system=system,
        messages=[{"role": "user", "content": task}],
    )
    return response.content[0].text

def orchestrator_agent(goal: str) -> str:
    """
    Decomposes a goal into parallel sub-tasks, fans them out to specialist
    agents, then synthesizes all results into a final answer.
    """
    # Step 1: Decompose
    decompose_prompt = f"""Break this goal into 3-5 parallel sub-tasks that can be executed independently.
Each sub-task should have:
- role: the type of expert best suited for it
- task: the specific question or output needed

Return a JSON array of objects with "role" and "task" keys.

Goal: {goal}"""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": decompose_prompt}],
    )
    import json
    sub_tasks = json.loads(response.content[0].text)

    # Step 2: Fan out in parallel
    results = {}
    with ThreadPoolExecutor(max_workers=len(sub_tasks)) as executor:
        futures = {
            executor.submit(specialist_agent, st["role"], st["task"]): st
            for st in sub_tasks
        }
        for future in as_completed(futures):
            st = futures[future]
            results[st["role"]] = future.result()

    # Step 3: Synthesize
    synthesis_parts = "\n\n".join(
        f"### {role}\n{result}" for role, result in results.items()
    )
    synthesis_prompt = f"""You are synthesizing the outputs of multiple specialist agents into a final report.

Goal: {goal}

Specialist outputs:
{synthesis_parts}

Write a cohesive final answer that integrates all perspectives."""

    synthesis = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{"role": "user", "content": synthesis_prompt}],
    )
    return synthesis.content[0].text

if __name__ == "__main__":
    result = orchestrator_agent(
        "Evaluate whether an early-stage SaaS company should prioritize SEO or paid ads for growth."
    )
    print(result)

Pattern 6: Memory

Agents are stateless by default — each API call starts from scratch. Memory patterns add persistence. There are three layers:

Layer	What it stores	Implementation
In-context	Current conversation history	The `messages` list
External short-term	Recent sessions, working memory	Redis, database
External long-term	User preferences, knowledge	Vector DB (embeddings)

References:

Python Example — External Memory with Summary Compression

python
import anthropic
from dataclasses import dataclass, field

client = anthropic.Anthropic()

@dataclass
class AgentMemory:
    summary: str = ""
    recent_messages: list[dict] = field(default_factory=list)
    max_recent: int = 10

    def add_turn(self, role: str, content: str):
        self.recent_messages.append({"role": role, "content": content})
        if len(self.recent_messages) > self.max_recent:
            self._compress()

    def _compress(self):
        to_compress = self.recent_messages[:-4]  # Keep last 4 messages fresh
        compress_prompt = f"""Summarize this conversation history. Preserve key facts, 
decisions made, and any information the user mentioned about themselves or their goals.

Existing summary: {self.summary or 'None'}

New messages to incorporate:
{chr(10).join(f"{m['role']}: {m['content']}" for m in to_compress)}

Updated summary:"""

        response = client.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=512,
            messages=[{"role": "user", "content": compress_prompt}],
        )
        self.summary = response.content[0].text
        self.recent_messages = self.recent_messages[-4:]

    def build_messages(self, new_user_message: str) -> list[dict]:
        messages = []
        if self.summary:
            messages.append({
                "role": "user",
                "content": f"[Conversation summary so far: {self.summary}]"
            })
            messages.append({
                "role": "assistant",
                "content": "Understood. I'll continue with that context in mind."
            })
        messages.extend(self.recent_messages)
        messages.append({"role": "user", "content": new_user_message})
        return messages

def stateful_agent(memory: AgentMemory, user_message: str) -> str:
    messages = memory.build_messages(user_message)
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="You are a helpful assistant with memory of previous conversations.",
        messages=messages,
    )
    reply = response.content[0].text
    memory.add_turn("user", user_message)
    memory.add_turn("assistant", reply)
    return reply

if __name__ == "__main__":
    memory = AgentMemory()
    turns = [
        "My name is Priya and I'm building a B2B SaaS for logistics.",
        "What pricing models work well for logistics software?",
        "We're targeting mid-market companies with 100-500 employees.",
        "Given what you know about me, what's the right pricing tier structure?",
    ]
    for turn in turns:
        print(f"User: {turn}")
        reply = stateful_agent(memory, turn)
        print(f"Agent: {reply}\n")

Pattern 7: Routing

When you have multiple specialized agents or pipelines, a router classifies the incoming request and dispatches it to the right handler. This avoids giving every specialist all tools (which dilutes their focus) and keeps system prompts short and targeted.

References:

Python Example

python
import anthropic
import json
from typing import Callable

client = anthropic.Anthropic()

# Define specialist handlers
def handle_code(query: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system="You are an expert software engineer. Provide working code with brief explanations.",
        messages=[{"role": "user", "content": query}],
    )
    return response.content[0].text

def handle_data_analysis(query: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system="You are a data analyst. Provide structured analysis with numbers, trends, and actionable conclusions.",
        messages=[{"role": "user", "content": query}],
    )
    return response.content[0].text

def handle_writing(query: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system="You are a professional writer and editor. Produce clear, engaging prose.",
        messages=[{"role": "user", "content": query}],
    )
    return response.content[0].text

def handle_general(query: str) -> str:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        messages=[{"role": "user", "content": query}],
    )
    return response.content[0].text

ROUTES: dict[str, Callable[[str], str]] = {
    "code": handle_code,
    "data_analysis": handle_data_analysis,
    "writing": handle_writing,
    "general": handle_general,
}

def route(query: str) -> str:
    router_prompt = f"""Classify the following user query into exactly one of these categories:
- code: programming, debugging, software architecture
- data_analysis: statistics, trends, metrics, business intelligence
- writing: content creation, editing, summarization
- general: anything else

Return a JSON object with a single key "category".

Query: {query}"""

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": router_prompt}],
    )
    result = json.loads(response.content[0].text)
    category = result.get("category", "general")
    if category not in ROUTES:
        category = "general"

    print(f"Routing to: {category}")
    return ROUTES[category](query)

if __name__ == "__main__":
    queries = [
        "Write a Python function to find all prime numbers up to n.",
        "Analyze the year-over-year growth trend from these numbers: 120, 145, 190, 240, 310.",
        "Rewrite this sentence to be more concise: 'Due to the fact that we were unable to...'",
        "What is the capital of Peru?",
    ]
    for q in queries:
        print(f"\nQuery: {q}")
        print(f"Answer: {route(q)}")

Combining Patterns

Real systems combine these patterns. A common architecture for a production research agent:

User Request
     │
     ▼
 [Router] ──► Simple query? ──► [ReAct Agent with tools]
     │
     └──► Complex research? ──► [Planner]
                                    │
                                    ▼
                            [Orchestrator]
                           /      |       \
                    [Web Agent] [DB Agent] [Analysis Agent]
                           \      |       /
                            ▼     ▼     ▼
                          [Reflection Layer]
                                  │
                                  ▼
                            [Memory Store]
                                  │
                                  ▼
                            Final Response

Each pattern solves a specific problem. Add them only when you hit the wall their pattern addresses — adding all of them up front creates unnecessary complexity.

Guardrails and Reliability

Agentic systems fail in ways that pure LLM calls don't. Common failure modes and how to handle them:

Infinite loops: Set a hard max_iterations limit. Log each iteration to detect cycles.

Tool hallucinations: Return structured errors from tool handlers. Validate tool input against the schema before executing.

Context window overflow: Use the Memory pattern's summary compression. For long pipelines, prune intermediate results aggressively.

Cascading failures in multi-agent: Use timeouts on each sub-agent call. Return partial results rather than failing the entire pipeline.

python
import signal
from contextlib import contextmanager

@contextmanager
def time_limit(seconds: int):
    def handler(signum, frame):
        raise TimeoutError(f"Agent timed out after {seconds}s")
    signal.signal(signal.SIGALRM, handler)
    signal.alarm(seconds)
    try:
        yield
    finally:
        signal.alarm(0)

# Usage
try:
    with time_limit(30):
        result = react_agent("Complex research task...")
except TimeoutError:
    result = "Agent timed out. Partial results available."

Summary

Pattern	Token cost	Latency	Use when
ReAct	Low–Medium	Low	Default starting point
Reflection	High	High	Quality-critical outputs
Tool Use	Depends on tools	Low + tool latency	Any external data need
Planning	Medium	Medium	Long-horizon tasks
Multi-Agent	High	Low (parallel)	Large tasks, specialization
Memory	Low (compression)	Negligible	Stateful sessions
Routing	Very low	Negligible	Mixed workloads

Start with ReAct + Tool Use. Add Reflection when output quality is the bottleneck. Add Planning when tasks exceed 5-6 tool calls. Add Multi-Agent when tasks are naturally parallelizable. Add Memory when sessions span multiple turns. Add Routing when you have clearly distinct workloads.

Every pattern you add is a complexity tax. Pay it only when you've hit the specific wall it solves.

Agentic Design Patterns: Building Reliable AI Systems with Python

What Makes a System "Agentic"?

The Core Patterns

Pattern 1: ReAct (Reasoning + Acting)

Python Example

Pattern 2: Reflection

Python Example

Pattern 3: Tool Use (Function Calling)

Python Example — Structured Tool Registry

Pattern 4: Planning (Plan-and-Execute)

Python Example

Pattern 5: Multi-Agent Orchestration

Python Example — Parallel Sub-Agents with ThreadPoolExecutor

Pattern 6: Memory

Python Example — External Memory with Summary Compression

Pattern 7: Routing

Python Example

Combining Patterns

Guardrails and Reliability

Further Reading

Summary

Related posts

Setting Up MCP for Puppeteer: Building Smarter Browser Automation

Will AI Agents Replace Traditional Web Scrapers? An Honest Comparison