Context Engineering for LLM Agents

Prompt engineering was about wording a single question well. Context engineering is the harder, more important discipline underneath it: deciding what goes into the model's limited context window at each step of a task — and, just as importantly, what stays out.

The one-line definition worth memorising: context engineering is the art and science of filling the context window with just the right information for the next step. Not everything you have. Not the whole conversation. The right slice, in the right order, at the right moment.

This post covers why it matters, the four ways it goes wrong, and the techniques that keep agents reliable as tasks get long.

Why the model can't just "have everything"

A context window is finite, and it is not free. Every token you put in costs money and adds latency, and — less obviously — more context does not mean better answers. Past a point, extra material actively hurts: the signal the model needs gets buried, and its attention spreads thin across things that don't matter for the current step.

So the job isn't to stuff the window. It's to curate it. That curation is where reliability is won or lost.

The four failure modes

Almost every agent bug that isn't a plain code error traces back to one of these:

Context poisoning — a wrong fact (a hallucinated value, a stale tool result) enters the context early and gets carried forward, contaminating every subsequent step.
Context overload — too much is packed in, and the relevant detail is lost in the noise. The model has the answer in front of it and still misses it.
Token bloat — the window fills with redundant history, verbose tool output, and repeated boilerplate. Cost and latency climb; quality doesn't.
Distraction / goal drift — deep into a long task, the original objective slips out of the active window and the agent wanders.

Name the failure and the fix usually names itself. Overload wants pruning. Bloat wants compaction. Poisoning wants validation and isolation. Drift wants the goal re-anchored.

The techniques

Think of context assembly as a pipeline. On each step you gather candidate material, select from it, order it, and hand the model exactly what it needs:

sources                      assemble                model
─────────                    ────────                ─────
system / instructions  ┐
retrieved docs (RAG)   │
memory                 ├──►  select · compact · order ──►  LLM
tool results           │
conversation           ┘        ▲
                                └── keep the goal in view

Here are the levers that pipeline gives you.

1. Retrieval (RAG)

Don't pre-load a knowledge base into the prompt — fetch the few passages relevant to this step and inject only those. Retrieval keeps the fixed context small and pulls detail in on demand. It's the difference between handing the model a library and handing it the one paragraph it needs. (Vector search is the usual engine here — see our vector databases explainer.)

2. Memory — short-term and long-term

Not everything belongs in the live transcript. Split it:

Short-term / working memory lives in the current window — the last few turns, the active plan.
Long-term memory lives outside it — a file, a store, a scratchpad the agent reads from and writes to across sessions.

Give the agent a place to jot down what it learns and a habit of consulting it, and long tasks stop re-deriving the same facts. Store one lesson per note, and update rather than duplicate.

3. Dynamic tool loadout

Every tool definition you expose costs tokens and adds a decision the model has to make. Loading all your tools on every call bloats the window and dilutes tool selection. Instead, surface only the tools relevant to the current query — dynamic tool selection has been shown to sharply improve function-calling accuracy in benchmarks, because the model chooses from a focused set rather than a crowded one.

4. Compaction and summarisation

When history grows past what's useful, summarise the old turns into a compact form and drop the raw transcript. Compaction keeps the meaning of the conversation while shedding the token weight — the antidote to bloat on genuinely long-running agents. The trick is to compact the stale middle while preserving the goal and the most recent, most relevant turns verbatim.

5. Isolation with sub-agents

One giant context thread trying to hold every sub-task is a recipe for overload and drift. A better shape: a coordinator that spawns focused sub-agents, each with its own clean, isolated context for one piece of the work. Anthropic's research on multi-agent systems found this can outperform a single-agent setup precisely because contexts stay separated — no cross-contamination, no one window doing everything. (More on these shapes in agentic design patterns.)

6. Deliberate ordering

Finally, order matters. Put stable, foundational context first (system instructions, task definition) and volatile, step-specific material last. This isn't just tidiness — it's what makes prompt caching work, and it keeps the model's attention on what changed rather than re-reading a shifting preamble.

A rule of thumb

Start simple and add structure only when a task demands it. Most agents don't need a memory system, sub-agent orchestration, and dynamic retrieval on day one — they need a clean, well-ordered prompt and one good retrieval step. Add each technique when a specific failure mode shows up: reach for compaction when you hit bloat, isolation when one context is doing too much, memory when the agent keeps forgetting.

Context engineering isn't a framework you install. It's a discipline you apply — deciding, at every step, what the model needs to see and nothing more.

If you're building agents on top of web data, the quality of what you feed them starts with the quality of what you extract. CrawlPilot turns messy pages into clean, structured rows — the kind of well-formed context an agent can actually use. The getting started guide shows the extraction flow end to end, and agentic design patterns covers the loops this context feeds.

Context Engineering: Techniques for Building Reliable LLM Agents