Publié le March 5, 2026

In Part 1 of this series, we examined the smallest meaningful unit in modern AI systems: a single AI call. That call operates with limited context, has no memory of prior interactions, and produces output that must be interpreted by surrounding software. Those constraints explain much of the behavior that makes AI feel powerful one moment and unreliable the next.
This second part zooms out. Once you understand the limits of a single call, the broader AI ecosystem starts to make sense. The tools, protocols, and patterns teams rely on today don’t replace that primitive—they exist to manage it and make AI usable in real products.
In isolation, a single AI call is fragile. It forgets everything between requests, only works with the information it is explicitly given, and produces probabilistic output that software must interpret carefully.
Production systems can’t afford to rely on that fragility. They need repeatability, safety, and integration with existing systems. That’s where abstractions come in.
These abstractions decide what context to include, when to invoke tools, how to validate outputs, and how to sequence multiple calls over time. They exist to impose structure and control on top of a fundamentally stateless and probabilistic core.
Seen this way, the AI ecosystem is not a race toward ever-smarter models. It’s a growing collection of architectural responses to the same underlying constraints.
One of the most important gaps exposed in Part 1 is the difference between intent and execution. When a model produces structured output describing a tool invocation, nothing has actually happened yet. The model has expressed what it wants to do, not done it.
The Model Context Protocol, or MCP, exists to bridge that gap.
MCP packages tool definitions together with the logic required to execute them and return results in a predictable way. Instead of treating tools as just structured responses or arguments for a function, MCPs package the tool definitions with their functions so the application can execute the full loop of model request to tool execution and back to model or user feedback.
Crucially, this loop lives outside the model. The model never executes code, manages errors, or handles side effects. Those responsibilities remain with the application, where teams can enforce permissions, control latency, and define failure behavior.
Understanding MCP in this light removes much of the mystery. It doesn’t add intelligence to the system. It adds operational discipline.

Statelessness is one of the most fundamental constraints of language models. Every call starts from a clean slate. Without intervention, there is no continuity, no private knowledge, and no awareness of what happened previously.
Retrieval-augmented generation, or RAG, is designed to address this constraint without pretending it doesn’t exist. Rather than trying to give the model memory, RAG focuses on selective recall.
At request time, a retrieval system identifies a small, relevant set of information—often from a vector database built on private or time-sensitive data—and injects it into the prompt. Instead of overwhelming the model with everything it might need, RAG narrows the scope to what matters right now.
This distinction is important. RAG does not make models inherently more accurate or truthful. It improves relevance. By constraining the context window to the most useful information, it increases the likelihood that the model’s confidence aligns with reality.

Few terms in AI are as overloaded as “agent.” In practice, agents are not a new category of model or a sign of emergent autonomy. They are orchestration patterns built on top of repeated AI calls.
An agent system typically involves multiple calls with different roles. One call interprets a request and produces a plan. Others invoke tools, retrieve information, or evaluate intermediate results. Between each step, application logic decides what happens next.
This architecture matters because it clarifies what agents are—and what they aren’t. Agents don’t think continuously or act independently. They coordinate a sequence of stateless calls, each operating under the same constraints described in Part 1.
Their power comes from sequencing and feedback, not from persistence or awareness.

The same structure that makes agents powerful also makes them brittle. Each additional call introduces another opportunity for missing context, incorrect assumptions, or misaligned instructions. Small errors compound quickly.
This fragility isn’t accidental. It’s a direct consequence of building complex behavior from simple, stateless primitives. Reliable agents require careful orchestration, explicit boundaries, and deliberate handling of failure at every step.
Understanding this helps set realistic expectations. Agents can accomplish impressive things, but they demand engineering rigor rather than optimism.
The modern AI ecosystem can look complex at first: protocols, retrieval pipelines, orchestration layers, and agent frameworks layered on top of one another. But these components all exist for the same reason. They are responses to limited context, statelessness, ordering effects, and the need for control.
Once you see that, the ecosystem stops feeling arbitrary. Prompts, tools, MCP, RAG, and agents are not competing ideas. They are complementary strategies for working within the same constraints.
Progress in AI systems doesn’t come from ever more clever prompts. It comes from understanding the primitive at the center of everything—and designing responsibly around it.
Inspiration pour votre boîte mail
Abonnez-vous et restez au courant des meilleures pratiques pour offrir des expériences numériques modernes.