VS Copilot and the Multi-Agent Approach: When AI Moves into Daily Engineering Work

05 march 2026

In the previous article, we looked at Microsoft Agent Framework as a way to technically assemble an agent system: with separate agents, workflows, tools, state, middleware, hosting, and observability. This approach works well when we are building our own solution or a closed production system around a clearly defined process.

But in day-to-day DevOps work, the task often looks different. You do not need to design an agent platform from scratch. You need to quickly understand a specific repository: open the pipeline configuration, inspect the web application Dockerfile, check the Dockerfile for the background process, understand the project structure, compare it with the instructions, and, if necessary, run commands in the terminal.

Introduction

If Microsoft Agent Framework answers the question “how do we technically build an agent system?”, then Copilot in VS Code answers a different question: how can we apply similar principles in daily engineering work without building the whole platform ourselves? According to the official documentation, Copilot in VS Code is no longer just code autocomplete and not merely a separate chat window. It is a set of working modes and capabilities: inline suggestions, inline chat, chat sessions, agent mode, review edits, checkpoints, custom instructions, custom agents, agent skills, MCP servers, hooks, Agent Logs, and Chat Debug. All of this supports a very important practical scenario: working with a live repository, files, terminal, and engineering context right where DevOps engineers and developers already spend most of their time.

Where Copilot is stronger, and where Agent Framework is more appropriate

Copilot in VS Code and Microsoft Agent Framework solve related, but not identical, problems. It is better to compare them not as competitors, but as tools with different centers of gravity. GitHub Copilot in VS Code is stronger where the work happens directly inside the engineering environment: with code, configs, terminal, and repository structure. Its main advantage is not that it is “yet another chat with a model,” but that it is embedded exactly where the daily work of a developer or DevOps engineer already happens. Copilot is convenient when you need to:

understand the structure of a project;
open and compare several files;
analyze pipeline YAML;
check a Dockerfile or Helm chart;
make targeted configuration changes;
run a command in the terminal;
inspect the validation result and immediately adjust the code.

In other words, Copilot fits the engineering cycle well: analysis → edit → verification → refinement → another pass. It reduces the friction between understanding a problem and actually changing the project. Microsoft Agent Framework is more appropriate in other scenarios. It is needed not so much for work inside the IDE, but for programmatically building your own agent system. Its strength becomes visible where the agent process must exist as a separate application, service, or long-running workflow. Agent Framework makes more sense when you need:

background multi-step processes;
workflow scenarios with checkpointing;
coordination of multiple agents or functions;
integration with external services;
persistent state between runs;
orchestration where the execution graph matters;
production control, telemetry, hosting, and execution policies.

To simplify, the difference is this: Copilot is more convenient as a working environment for the daily engineering cycle, while Agent Framework is a framework for autonomous and multi-step agent systems outside the IDE. For example, if an engineer is investigating a failed pipeline, opening YAML, inspecting a Dockerfile, comparing terminal output, editing a config, and immediately running validation again, Copilot in VS Code is the more natural choice. It is already close to the files, terminal, and current state of the project. But if the process needs to live longer than one working session, pass through a fixed set of steps, coordinate several roles, save checkpoints, and call external services as part of a separate application, then Microsoft Agent Framework becomes more appropriate.

Main Copilot features in VS Code

Let’s look at the main Copilot features that can be useful in daily DevOps work.

Regular chat versus agent in an engineering environment

1. Agent mode and sessions

The main capability of Copilot in VS Code is agent mode. In this mode, Copilot stops being just a helper for individual lines of code and becomes a participant in the working session. In Chat view, you can choose how exactly the agent will work:

where to run the task: locally, in the background, or in the cloud;
which role to use: for example, Ask, Plan, or Agent;
how autonomously the agent can use tools;
which model to use, if several options are available.

This matters for long engineering tasks. For example, instead of simply asking “what is wrong with the pipeline?”, you can give Copilot a task: open the required files, understand the configuration, propose changes, and verify the result through the terminal. The Plan agent is especially useful. It is convenient to use before starting a complex task: first ask Copilot to create a plan, review it yourself, and only then move on to changes. This reduces the risk that the agent will immediately start modifying the project in the wrong direction.

2. Several working modes, not just one chat

Copilot in VS Code is not only Chat view. It has several working surfaces, and each fits a different type of task. Chat view is convenient for long tasks: understanding a project, analyzing several files, discussing a plan, and performing a series of steps. Inline chat is useful for local edits directly in the open file. For example, rewriting a function, simplifying a YAML block, or fixing a specific fragment of a Dockerfile. Quick chat is useful for short questions when you do not want to switch context. Inline suggestions and next edit suggestions help while writing code: they suggest line completions, block completions, or the next edit. Smart actions cover common operations: fix an error, explain code, search semantically, generate a commit message. The practical point is simple: not every task needs to be delegated to a large autonomous agent. Sometimes a local suggestion in the editor is enough, sometimes you need chat, and sometimes you need full agent mode.

3. Proper work with context

One of the strong sides of Copilot in VS Code is how it works with project context. Copilot can consider not only the text of the question, but also what is currently open in the IDE: the active file, selected fragment, file name, workspace structure, and other elements. Context can be provided explicitly. For example, with #file, #codebase, #terminalSelection, and other references, you can tell Copilot exactly which data to take into account. With @terminal and other mentions, you can address specialized participants or context sources. This matters because, in DevOps tasks, the quality of the answer almost always depends on the right context. Asking abstractly “why is the pipeline failing?” is one thing. Giving Copilot a specific YAML file, Dockerfile, terminal output, and project structure is completely different. The main idea here is that context can be managed. You do not have to dump everything into one large prompt. You can feed it in measured portions: show the agent only the required files, the required terminal output, and the relevant part of the repository.

4. Edits, review, and checkpoints

When working with code, it is not enough that Copilot can propose changes. What matters is that these changes can be controlled. VS Code provides a convenient review loop:

you can see which files were changed;
you can inspect inline diffs;
you can accept or reject individual edits;
you can roll back to a checkpoint;
you can use staging in Source Control as an additional confirmation step.

This is especially valuable in DevOps. Changes in a pipeline, Dockerfile, Helm chart, or Terraform configuration cannot be accepted blindly. Copilot can propose a solution, but the engineer must see exactly what changed. In practice, this creates a convenient collaboration mode: the agent produces a draft of changes, the engineer reviews the diff, accepts what is useful, manually adjusts what is questionable, and only then runs validation.

5. Terminal and result validation

For DevOps, the terminal is not an optional feature. It is an essential part of the workflow. Copilot in VS Code can work next to the terminal: suggest commands, explain them, read output, and adjust the next steps based on the result. This fundamentally changes the quality of the work. The agent does not merely reason about what might be wrong. It can help go through a normal engineering iteration:

inspect configuration ↓ propose a change ↓ run a command ↓ read the error ↓ refine the solution ↓ repeat validation

This turns AI assistance into a real part of the diagnostic process: there is a change, a command, output, and the next step.

6. Project-specific customization

Copilot can be customized for a specific project. But it is important not to put all knowledge into one huge instruction file. If one file starts containing everything at once — code style, architecture, build commands, security rules, agent roles, workflow, and domain instructions — it quickly becomes noise. The agent starts pulling unnecessary context into every request, and maintaining such a structure becomes difficult. It is better to separate settings by layers. copilot-instructions.md should be used for the general project frame: style, constraints, baseline rules, and definition of done. AGENTS.md is convenient as a repository map: what is located where, which areas are responsible for what, and which commands are used for build and test. *.instructions.md files are suitable for local rules. For example, one part of the repository may have its own Terraform conventions, another may have backend code conventions, and a third may have deployment YAML conventions. *.prompt.md and *.agent.md are better suited for repeatable tasks and separate roles: planner, executor, reviewer, critic. SKILL.md is useful for narrow playbooks that are not always needed, but only in specific situations. .vscode/mcp.json and hooks are no longer about textual instructions, but about integrations and strict execution rules. The meaning of this separation is simple: each file should solve its own task. General rules should not be mixed with local conventions, roles with playbooks, or integrations with textual instructions.

7. Observability and debugging of the AI process

Another important capability of Copilot in VS Code is debugging the agent work itself. For DevOps, it is important not only to get the result, but also to understand how the agent arrived at it: which files it inspected, which tools it called, what context it used, where it made a mistake, and why it chose a particular path. For this, VS Code provides observability tools:

Agent Logs show the event timeline, tool calls, and LLM requests.
Summary view helps see a summary of token usage, duration, and errors.
Chat Debug view lets you inspect the system prompt, user prompt, context, and tool payloads.
The /troubleshoot command helps analyze the behavior of the current session directly through Copilot.

This is where Copilot becomes especially useful for engineering work. Agent assistance must be not only “smart,” but also verifiable. When you can see what the agent actually did and on what basis it proposed a solution, it becomes easier for an engineer to trust the process and use it safely in a real project.

How to use all of this

In VS Copilot, the workflow usually looks like this. First, the Plan agent or the engineer defines the plan of steps. The task is launched in a local session, because that is where the agent has full access to the workspace, tools, and models. Context is provided through active files, #file, #codebase, #terminalSelection, and project instructions. The Agent makes multi-file edits and, if needed, works next to the terminal. The engineer reviews inline diffs, Keep/Undo, and checkpoints. If a separate check is needed, a critical pass is performed through another agent, a separate session, or a reviewer-style custom agent. If the agent’s behavior seems strange, Agent Logs or Chat Debug view are used instead of guessing.

How the principles from article 16 are implemented in the Copilot process

In article 16, we looked at established principles and approaches for multi-agent systems. Now let’s see how they can be implemented in Copilot. Plan-and-Execute You should not immediately ask Copilot: “fix everything.” For complex tasks, it is better to separate planning and execution. First, ask the Plan agent to create a plan: which files need to be inspected, which hypotheses should be checked, which changes are possible, and how the result should later be validated. After that, execute the task step by step: either one step per message or one step per separate session. For DevOps, this is especially useful when the task touches several places at once: pipeline YAML, Dockerfile, Helm chart, environment variables, and validation commands. The plan helps avoid mixing everything into one large edit. Context engineering The quality of Copilot’s answer strongly depends on the context you give it. It is better to explicitly specify the required sources: a particular #file with pipeline configuration, the required Dockerfile, or a selected error fragment through #terminalSelection. #codebase is useful during search and initial project discovery, but it should not be used in every request without need. The main rule is to give the agent exactly the context required for the current step. A common mistake is to immediately pull in the whole repository, the entire chat history, and all project instructions. This does not necessarily make the answer smarter. Often it does the opposite: the agent starts connecting irrelevant parts of the project and makes less precise conclusions. ReAct: inspect, hypothesize, verify In Copilot, this approach is convenient to use as a short engineering cycle:

inspect files and terminal output ↓ formulate a hypothesis ↓ make a narrow action ↓ verify the result ↓ suggest the next step

For example, if the pipeline failed after a Dockerfile change, you should not immediately ask Copilot to rewrite the configuration massively. It is better to first ask it to inspect the error, name one or two most likely causes, then check one hypothesis, and only after that propose a change. This reduces the risk that the agent will jump from a symptom to a large, poorly justified edit. Critic isolation Result validation is better moved into a separate step. The idea is simple: the one who made the changes should not review themselves in the same pass. Otherwise, Copilot may continue defending its own logic instead of honestly looking for problems. In practice, this can be done as follows:

first, Copilot in agent mode performs the task and proposes changes;
then, a separate review session or separate reviewer agent checks the result.

The reviewer should preferably receive not the entire original history, but specific materials for review:

diff;
list of changed files;
acceptance criteria;
validation commands;
expected result.

Then the reviewer evaluates exactly the result of the work: what changed, whether it matches the task, whether there are errors, risks, or missing checks. Simply put: first a separate execution pass, then a separate pass for criticism of the result. This makes the review more independent and useful. Rubric-based review Review should not be phrased as “check whether everything is fine,” but should follow predefined criteria. For DevOps changes, these criteria can be simple:

Blocker — what absolutely cannot move forward;
Warning — what does not block, but requires attention;
Missing validation — which checks are missing;
Rollback risk — whether there is a risk of rollback problems;
Security impact — whether there is any security impact.

This rubric can be saved in copilot-instructions.md, a separate review prompt, or reviewer-agent settings. Then Copilot will check changes not as free text, but according to a clear structure. The output will not be “looks fine,” but a concrete list: what blocks the merge, what should be fixed, which checks need to be added, and where risks exist. Reflexion If an attempt did not work, you should not simply repeat the same request again. Copilot may go down the same wrong path. It is better to briefly record before the next attempt:

what has already been tried;
which command was run;
which error remains;
which hypotheses were not confirmed;
what should not be repeated.

For example: We tried changing the Dockerfile and replacing the base image. After that, the build still fails on dependency restore. The registry access hypothesis was not confirmed. Do not repeat the registry permissions check; move on to analyzing NuGet/source mapping. Copilot does not have a separate “Reflexion” button, but this approach is easy to use manually: at the end of a failed attempt, ask it to produce a short summary, and then pass that summary into the next session or next step. It is important to understand: checkpoints help roll back files, but they do not explain why the attempt did not work. So, in addition to a checkpoint, it is useful to save a short textual conclusion: what was done, what happened, and what follows from that. Session context Long work is better split into sessions by phase. For example:

session 1 — discovery and plan;
session 2 — execution;
session 3 — review;
session 4 — documentation or commit message.

At the beginning of each session, it is useful to explicitly state the goal, constraints, and completion criteria. If the chat becomes too long and noisy, it is better to start a new session and pass a short handoff into it than to keep dragging the entire history forward. Step logging / trace For serious DevOps tasks, it is important to understand exactly what was done. Copilot provides the technical part of observability through Agent Logs, Summary, Agent Flow Chart, and Chat Debug view. There you can inspect tool calls, context, LLM requests, and errors. But that is not always enough. For an engineering trace, it is also useful to maintain a short log:

what was checked;
which files were changed;
which commands were run;
what result was obtained;
what conclusions were made.

Such a log helps return to the task later or hand it off to another engineer without losing context. Stop and escalate You need to define in advance when the agent must stop. For example:

no more than two or three attempts without progress;
stop when context is insufficient;
stop before changing secrets or permissions;
stop before production-critical changes;
stop when there is a risk of deleting or overwriting data.

In Copilot, this is supported both process-wise and technically. Permission levels limit the agent’s autonomy, and the engineer can stop execution at any moment and switch to manual mode. This is especially important for DevOps, where an error can affect infrastructure, access, or production. Least privilege tools You should not give every role maximum access. Planner and reviewer can usually work in read-only mode. They only need to read files, analyze diffs, and provide conclusions. Executor can receive permission to edit files and use the terminal in a limited way. External systems should be connected only through the MCP servers that are truly required for the specific task. This is not only a security issue. The fewer unnecessary tools an agent has, the more predictable its behavior becomes. If a role is given too many capabilities, it will more often take unnecessary steps and complicate the process. If we put all of this together, Copilot in VS Code allows us to apply multi-agent principles in a very practical form: plan separately from execution, manage context, verify the result through a separate reviewer pass, record steps, limit access, and stop automation in time. This is what turns Copilot from a “smart chat in the IDE” into a working tool for a controlled engineering process.

Which DevOps tasks fit this approach well

Copilot in agent mode helps most where the task consists not of one action, but of several steps: inspect files, understand relationships, propose a hypothesis, make an edit, validate the result, and separately assess risks. CI/CD pipeline analysis This is one of the clearest scenarios. A pipeline often consists of many steps, conditions, variables, templates, and dependencies. Manually understanding where exactly the logic broke can be difficult. Copilot can help go through the task in stages:

first analyze the pipeline structure;
then find suspicious areas;
then suggest a possible cause of the error;
after that, check whether the fix will break other environments.

Here, the value is not in one “smart answer,” but in sequential analysis next to real YAML files, logs, and project settings. Working with Dockerfiles and containers A Dockerfile rarely breaks by itself. The problem is usually tied to the specific project context: base image, layer order, dependency installation, environment variables, entrypoint, runtime, or differences between local build and deployment. Copilot is useful because it can look not at an abstract Dockerfile, but at the concrete files of the project. For example:

the executor analyzes the Dockerfile and related configs;
proposes a targeted change;
the reviewer separately checks risks.

The reviewer may notice bloated layers, strange COPY order, risky commands, implicit dependencies, or build reproducibility problems. Deployment and incident diagnostics An incident usually starts with chaos: logs, errors, recent changes, pipeline, configs, and environment behavior. All of this needs to be quickly connected into one picture. Here, it is convenient to split the work into roles:

one role collects symptoms;
another formulates hypotheses;
a third checks which hypotheses are weak and which are worth checking next.

This gives the engineer not a long generic explanation, but a shorter path to diagnosis: what is known, what to check first, and which versions can already be discarded. Review of infrastructure changes Copilot is well suited for reviewing changes in pipelines, deployment scripts, Terraform, Helm charts, or Kubernetes manifests. Here, it is important to check not only syntax, but also operational consequences:

can the change break deployment;
is there a rollback;
are other environments affected;
is there enough validation;
are there hidden assumptions;
is there any security impact.

For DevOps, this is especially important because an infrastructure change error does not always immediately appear as a failed test. Sometimes it turns into an unstable deployment, an access issue, or a night-time incident. Documentation and repo context support Another strong scenario is updating documentation alongside changes. After diagnostics or a fix, new knowledge often needs to be recorded: update the README, runbook, pipeline description, deployment instructions, or repo context. Copilot can help here too:

one role gathers facts from the repository;
another writes the documentation draft;
a third checks for unsupported claims or dangerous simplifications.

This way, documentation is updated not separately “someday later,” but directly in the context of real work with the project. This reduces the chance that important knowledge remains only in an engineer’s head or in a long chat thread.

Copilot limitations

Even though Copilot is embedded directly into the IDE, that does not automatically make the agent process high-quality and safe. This approach has limitations. First, much depends on instructions and roles. If it is unclear what the executor should do, what the reviewer should check, and when the task is considered complete, Copilot may start spending time on unnecessary actions. That is why roles, rules, and completion criteria should be defined explicitly. Second, the agent can be confidently wrong. It may beautifully explain the cause of a problem, but miss an important file, fail to consider an environment setting, or draw a conclusion without enough context. So engineering validation remains mandatory: diffs, commands, logs, and test results matter more than confident text. Third, there is a risk of overcomplicating the process. Not every task needs a planner, executor, critic, and separate reviewer. Sometimes one Copilot request, a local edit, and manual validation are enough. The multi-agent approach is useful where the task is truly multi-step and risky. Fourth, access must be handled carefully. If the agent can read files, change code, run commands, and access external systems, incorrect permission configuration becomes dangerous. This is especially true for secrets, production configurations, access rights, and destructive commands. Copilot can significantly accelerate engineering work, but it does not remove engineering control. It should be used as an assistant in a controlled process, not as a fully autonomous replacement for an engineer.

Summary

The main value of Copilot in VS Code becomes visible very quickly: there is less manual switching between the repository, terminal, files, and a separate discussion of the task. If the agent can go through configs itself, collect context, propose changes, and hand the result over for separate review, part of the routine simply disappears. The engineer no longer has to constantly copy code fragments, logs, and errors into a separate chat. Everything happens next to the place where the actual task lives. That is why Copilot’s strength in VS Code is not just access to a model. Its strength is that the model works inside a real engineering loop: next to files, commands, project instructions, diffs, and validation results. When the agent sees not a retelling of the problem, but the repository itself and terminal output, it helps not abstractly, but in the context of a concrete project. But it is important to remember: the closer the agent is to the real project, the more important control becomes. You need to review diffs, inspect command output, limit access, separate roles, and avoid accepting changes without engineering validation. So:

first, we split the work into roles;
then, we looked at how to assemble such a scheme technically;
then, we moved the agent approach closer to the real repo, files, and terminal.

After that, the next practical question appears: if this scheme works, how do we avoid rebuilding it from scratch in every new project? Next, we will talk about a portable framework: a set of rules, instructions, roles, prompts, skills, and conventions that can be reused across projects.