In the simplest version, the scheme looks like this.
- The orchestrator agent gathers and structures the context.
- The second agent proposes a solution or a draft of the changes.
- The third checks the result: where the risks are, what was missed, and what needs to be checked manually.
- If necessary, the orchestrator monitors the order of steps and decides when the result is ready to be passed to a human.
In more complex cases, domain roles appear. For example, one agent looks only at security, another at operational risks, and a third at documentation.
Summary of the multi-agent approachMulti-agent work did not appear yesterday, and by now the community has already developed practical approaches to building such systems.
The first is plan-and-execute. The idea is simple: before launching executors, someone must break the task down into steps and dependencies. Otherwise, execution almost immediately turns into chaotic jumping between logs, files, hypotheses, and fixes. This is where the orchestrator role appears.
The second important approach is ReAct. It defines the working pattern of an executor agent: first a hypothesis, then an action, then observation of the result. It looked at a file, ran a command, saw the output, and adjusted the next step.
The third is context management. Not every agent needs the entire context. Moreover, models today are not able to hold too much information in context effectively. So the system works better when context is loaded in measured portions: only the required files, only the required instructions, only the relevant facts. Otherwise, the system starts drowning in its own noise before it has time to bring value.
The fourth approach is a separate critic as an isolated role. The critic must be at the same level as the main role or stronger. Isolation is important here. If the critic receives all the internal chatter, earlier doubts, and reasoning of the author, it quickly starts checking not the result, but the logic it has already been contaminated by. That is why it is important for the critic to see the task, criteria, and result, not the executor's chain of reasoning.
The fifth approach is an explicit rubric for criticism. This is a concrete working scheme for the critic. Without it, the critic very quickly turns into a vague "I don't like this" role. When there are clear categories such as blocker, warning, and suggestion, the review starts working noticeably better. It brings clarity to the critic's output, which improves interpretation by other agents.
The sixth layer is Reflexion and proper iteration completion. It is important for the system to take into account not only the context, but also previous iterations of its own work. Otherwise, it will simply go in circles and burn tokens. At the same time, the number of iterations must be limited. At some point, the system must be able to say that a human is needed next.
There are also additional amplifiers. RAG is useful where important knowledge lives outside the current dialogue: in documentation, knowledge packages, standards, and internal bases. Tree of Thoughts and Skeleton-of-Thought help when the orchestrator first needs to build a solution skeleton or branch several plan options. Multi-Agent Debate and Mixture of Agents are appropriate when one review is no longer enough and you need either to confront positions or run the result through several levels of refinement. Spec-Driven Development is useful in tasks where you first need to agree on a specification and only then move to implementation. But this is already fine-tuning for the needs and preferences of a specific project or person.
How do you configure all this?It is already clear that a multi-agent system cannot be built with one large prompt. The industry has already developed approaches to logically organizing the work of multiple agents.
The first thing to know is AGENTS.md. It lives at the root of the repository and serves as a project map. It is convenient to keep core context there: what is in the repository, how to run the project, where the sensitive areas are, what commands exist, what limitations and working boundaries apply. This file is the first one the agent reads, as a context store.
Separate prompt files for each agent are needed for the same reason roles are separated in the first place. The orchestrator, executor, and critic should not live in one long text. They have different tasks, different toolsets, and different ways of looking at the result. If all of this is thrown into one prompt, the executor will start absorbing the critic's logic, the critic will receive unnecessary noise, and the file itself will quickly become hard to maintain. In practice, a scheme where each role has its own file with its own contract works much more reliably.
A separate layer is skills, usually in the form of SKILL.md. They are needed where knowledge is repeated and should be connected only when necessary: security, documentation, architectural patterns, or specifics of a particular stack. Practice shows that if an agent prompt becomes too bloated, the agent starts losing the meaning of words in the middle of that prompt. That is why it is better to describe the rules for connecting skills in the main prompt than to dump everything into one pile.
On top of that, there is usually a shared rules layer. For example, workspace instructions or copilot-instructions.md hold what should apply to all roles at once: general constraints, quality requirements, and baseline behavior. This is a different type of information. It is also useful to keep it separate so you do not duplicate the same things in every agent prompt.
Overview files such as llms.txt can also be useful, and sometimes separate .prompt.md files for repeatable scenarios. The first helps enter the project quickly without long reading. In effect, it repeats AGENTS.md, but in a less human-readable form. The second is useful when the same task is repeated many times and it is more convenient to formalize it as a reusable prompt block rather than rewrite it manually for every agent.
Next come the artifacts produced by the system's work. As soon as agents do more than one step, a question appears almost immediately: where should the context of the current task be stored, and how can we later understand what exactly the agents did within that task?
This is why a layer of session context almost always appears next to the permanent files. It can be a separate TASK_CONTEXT.md or another working file where the task statement, accepted constraints, important findings, previous failed attempts, and current status are accumulated. The meaning is very simple: the next pass over the task should not start from zero. If the executor has already hit a dead end, the critic has already found a weak spot, and the orchestrator has already narrowed the scope, this must be saved explicitly somewhere, not live only in the memory of the last dialogue.
On top of this, tracing of agent steps usually appears as well. As a rule, it is needed for debugging: who started the task, which iteration it was, who executed what, when the critic returned comments, where escalation to a human happened, and at which step the process got stuck.
As a result, beyond the idea of multi-agent work itself, the modern industry already has a number of practical approaches for implementing it. And all of this is actively developing right now.
What the final working scheme looks likeIf we combine these approaches into one normal working loop, it looks roughly like this.