In VS Copilot, the workflow usually looks like this.
First, the Plan agent or the engineer defines the plan of steps.
The task is launched in a local session, because that is where the agent has full access to the workspace, tools, and models.
Context is provided through active files,
#file,
#codebase,
#terminalSelection, and project instructions.
The Agent makes multi-file edits and, if needed, works next to the terminal.
The engineer reviews inline diffs, Keep/Undo, and checkpoints.
If a separate check is needed, a critical pass is performed through another agent, a separate session, or a reviewer-style custom agent.
If the agent’s behavior seems strange, Agent Logs or Chat Debug view are used instead of guessing.
How the principles from this article are implemented in the Copilot processBefore, we looked at established principles and approaches for multi-agent systems. Now let’s see how they can be implemented in Copilot.
Plan-and-ExecuteYou should not immediately ask Copilot: “fix everything.” For complex tasks, it is better to separate planning and execution.
First, ask the Plan agent to create a plan: which files need to be inspected, which hypotheses should be checked, which changes are possible, and how the result should later be validated. After that, execute the task step by step: either one step per message or one step per separate session.
For DevOps, this is especially useful when the task touches several places at once: pipeline YAML, Dockerfile, Helm chart, environment variables, and validation commands. The plan helps avoid mixing everything into one large edit.
Context engineeringThe quality of Copilot’s answer strongly depends on the context you give it.
It is better to explicitly specify the required sources: a particular
#file with pipeline configuration, the required Dockerfile, or a selected error fragment through
#terminalSelection.
#codebase is useful during search and initial project discovery, but it should not be used in every request without need.
The main rule is to give the agent exactly the context required for the current step.
A common mistake is to immediately pull in the whole repository, the entire chat history, and all project instructions. This does not necessarily make the answer smarter. Often it does the opposite: the agent starts connecting irrelevant parts of the project and makes less precise conclusions.
ReAct: inspect, hypothesize, verifyIn Copilot, this approach is convenient to use as a short engineering cycle:
inspect files and terminal output -> formulate a hypothesis -> make a narrow action -> verify the result ->suggest the next stepFor example, if the pipeline failed after a Dockerfile change, you should not immediately ask Copilot to rewrite the configuration massively. It is better to first ask it to inspect the error, name one or two most likely causes, then check one hypothesis, and only after that propose a change.
This reduces the risk that the agent will jump from a symptom to a large, poorly justified edit.
Critic isolationResult validation is better moved into a separate step.
The idea is simple: the one who made the changes should not review themselves in the same pass. Otherwise, Copilot may continue defending its own logic instead of honestly looking for problems.
In practice, this can be done as follows:
- first, Copilot in agent mode performs the task and proposes changes;
- then, a separate review session or separate reviewer agent checks the result.
The reviewer should preferably receive not the entire original history, but specific materials for review:
- diff;
- list of changed files;
- acceptance criteria;
- validation commands;
- expected result.
Then the reviewer evaluates exactly the result of the work: what changed, whether it matches the task, whether there are errors, risks, or missing checks.
Simply put: first a separate execution pass, then a separate pass for criticism of the result. This makes the review more independent and useful.
Rubric-based reviewReview should not be phrased as “check whether everything is fine,” but should follow predefined criteria.
For DevOps changes, these criteria can be simple:
- Blocker — what absolutely cannot move forward;
- Warning — what does not block, but requires attention;
- Missing validation — which checks are missing;
- Rollback risk — whether there is a risk of rollback problems;
- Security impact — whether there is any security impact.
This rubric can be saved in
copilot-instructions.md, a separate review prompt, or reviewer-agent settings.
Then Copilot will check changes not as free text, but according to a clear structure. The output will not be “looks fine,” but a concrete list: what blocks the merge, what should be fixed, which checks need to be added, and where risks exist.
ReflexionIf an attempt did not work, you should not simply repeat the same request again. Copilot may go down the same wrong path.
It is better to briefly record before the next attempt:
- what has already been tried;
- which command was run;
- which error remains;
- which hypotheses were not confirmed;
- what should not be repeated.
For example:
We tried changing the Dockerfile and replacing the base image.
After that, the build still fails on dependency restore.
The registry access hypothesis was not confirmed.
Do not repeat the registry permissions check; move on to analyzing NuGet/source mapping.
Copilot does not have a separate “Reflexion” button, but this approach is easy to use manually: at the end of a failed attempt, ask it to produce a short summary, and then pass that summary into the next session or next step.
It is important to understand: checkpoints help roll back files, but they do not explain why the attempt did not work. So, in addition to a checkpoint, it is useful to save a short textual conclusion: what was done, what happened, and what follows from that.
Session contextLong work is better split into sessions by phase.
For example:
- session 1 — discovery and plan;
- session 2 — execution;
- session 3 — review;
- session 4 — documentation or commit message.
At the beginning of each session, it is useful to explicitly state the goal, constraints, and completion criteria.
If the chat becomes too long and noisy, it is better to start a new session and pass a short handoff into it than to keep dragging the entire history forward.
Step logging / traceFor serious DevOps tasks, it is important to understand exactly what was done.
Copilot provides the technical part of observability through Agent Logs, Summary, Agent Flow Chart, and Chat Debug view. There you can inspect tool calls, context, LLM requests, and errors.
But that is not always enough. For an engineering trace, it is also useful to maintain a short log:
- what was checked;
- which files were changed;
- which commands were run;
- what result was obtained;
- what conclusions were made.
Such a log helps return to the task later or hand it off to another engineer without losing context.
Stop and escalateYou need to define in advance when the agent must stop.
For example:
- no more than two or three attempts without progress;
- stop when context is insufficient;
- stop before changing secrets or permissions;
- stop before production-critical changes;
- stop when there is a risk of deleting or overwriting data.
In Copilot, this is supported both process-wise and technically. Permission levels limit the agent’s autonomy, and the engineer can stop execution at any moment and switch to manual mode.
This is especially important for DevOps, where an error can affect infrastructure, access, or production.
Least privilege toolsYou should not give every role maximum access.
Planner and reviewer can usually work in read-only mode. They only need to read files, analyze diffs, and provide conclusions.
Executor can receive permission to edit files and use the terminal in a limited way.
External systems should be connected only through the MCP servers that are truly required for the specific task.
This is not only a security issue. The fewer unnecessary tools an agent has, the more predictable its behavior becomes. If a role is given too many capabilities, it will more often take unnecessary steps and complicate the process.
If we put all of this together, Copilot in VS Code allows us to apply multi-agent principles in a very practical form: plan separately from execution, manage context, verify the result through a separate reviewer pass, record steps, limit access, and stop automation in time. This is what turns Copilot from a “smart chat in the IDE” into a working tool for a controlled engineering process.