Boost Productivity with AI: Running Multiple Agents in Parallel

Lessons Learned from Running Multiple AI Agents in Parallel

–

agents, AI, ai-for-developers, Cognitive Load, Context, multi-agents, Ownership, planner, Preview, Review, Sharing, specifications, specs

Boris Cherny, the creator of Claude Code at Anthropic, recently shared his workflowwhere he runs 5+ AI agents in parallel—a planner, backend and frontend developers, testers, documenters—all coordinating to tackle complex tasks. It feels more like commanding Data on the USS Enterprise bridge than traditional coding.

Naturally, I had to try it.

I’ve been running and fine-tuning a multi-agent setup for a few weeks now, across different types of tasks. I’m still making improvements, but I’ve landed on a setup that genuinely makes me more productive—and that I actually feel comfortable with. That last part matters more than you’d think.

This is a long post, so here’s the TL;DR upfront:

Speed gains are real for complex tasks—but not magical.
You invest time in specifications instead of coding.
Multi-agent decomposition is a productivity and reliability tool, not a cost-saving one (even if you save time).
Cognitive load can be brutally high.
Review everything. Please.
Some agent explanations are unnecessarily verbose—meaning lots of wasted tokens.

What Actually Changed in My Workflow

After a few weeks working this way on mid-to-large tasks—and iteratively improving the workflow as I go—I can say my development workflow has genuinely sped up.

But the key point isn’t the speed. It’s the shift in what I actually do.

I spend more time thinking about what I want. More time rubberducking with agents. More time in dialogue with the planner about approaches. And needless to say, less time typing. Less time debugging syntax.

With some adjustments, it could be faster. Probably a bit. But I don’t think I can go much further yet without losing control and ownership of the process and the results. And I don’t think it’s worth trading focus, scope, or quality for even more speed.

The main caveat? I won’t pretend it’s relaxing. By the end of the day, my brain is sometimes about to melt. I’m not offloading cognitive work—I’m compressing it. What used to take days of spread-out effort now happens in one intense session.

Let me walk you through everything I’ve learned.

The Architecture: Five Agents, Three Chats

I adapted Cherny’s concept to work with my own workflow and tools (I use Cursor). After testing several approaches, I settled on running 3 chat windows simultaneously, each with different agent roles.

The structure looks like this:

        Chat 1: Planner → Orchestrator → Documenter
                   ↓↑
                workflow_[ID].md (shared state DOC)
                   ↓↑                    ↓↑
        Chat 2: Developer          Chat 3: Tester

One chat changes roles as the workflow progresses—starting as a planner, becoming an orchestrator, and finally transforming into a documenter. The other two chats handle development and testing respectively.

Could I run multiple developers or testers? Absolutely—the architecture supports it. The markdown coordination file allows task ownership marking, so agents can distribute work among themselves. It would definitely be faster.

But here’s the thing: they’d be faster than my ability to review and validate what they’re doing. And I’m not ready to give up that control yet. Maybe in the future, but not today.

This is the spec of /multiagent-multichat action:

/multiagent-multichat AGENT-NAME "task description" --workflow-id ABC123  --preview-mode

AGENT-NAME: Can be planer (Orchestrator → Planner → Documenter flow), developer, tester or documenter
"task" - Required. Task description or @file reference. It's a complete description of the task used by AGENT planner to create a Workflow with detailed instructions for each other agent
--workflow-id [ID] - Required. The workflow ID created from planner
--preview-mode - Optional. Makes changes but waits for user approval before marking complete

Why Parallel Chats Instead of Subagents?

You might wonder: why not just have the orchestrator launch other agents as subagents? You could do that, but in Cursor they’d run as a black box. Faster? Yes. But you don’t see what’s happening until the very end.

With parallel chat windows, I get both parallelization and visibility. I can watch the developer implement changes in real-time in one window, see the tester create and run tests in another, and track overall progress in the orchestrator’s window—all simultaneously.

This visibility isn’t just nice to have. It’s essential for maintaining control and catching issues before they compound.

Markdown as the Coordination Layer

The setup is surprisingly simple. A planner agent asks for clarifications, confirms approaches, and then breaks down tasks. It creates a shared state file—a plain .md file—and coordinates developer and tester agents that work independently in separate chat windows.

The magic is in task-level dependencies rather than agent-level blocking: a tester can start working on test1 as soon as all its prerequisite tasks (e.g., dev1) complete, even while dev2 is still in progress. This is where the parallelization really pays off.

All coordination happens through this shared markdown file. It includes which agent is in charge of each task and the preconditions for each task. Agents poll at regular intervals to wait for their turn and update the file once done.

No complex infrastructure needed—just a .md file you can open anytime to see exactly where things stand. The orchestrator monitors this file, waiting before switching to documenter mode, all within the same conversation.

The File Structure

For those who like to see how things are organized:

## 📁 File Structure
 
**Action** (how to use):
- `.cursor/local/actions/multiagent-multichat.mdc` - Command reference (135 lines)
 
**File Structure**
 
.cursor/local/multiagent-multichat/
├── INDEX.md                     # This file (system overview)
├── agents/
│   ├── planner.md               # Complete planner guide (283 lines)
│   ├── developer.md             # Complete developer guide (143 lines)
│   ├── tester.md                # Complete tester guide (253 lines)
│   └── documenter.md            # Complete documenter guide (191 lines)
└── plans/
    ├── workflow_EXAMPLE.md      # Reference example
    └── workflow_[ID].md         # Active workflows (Markdown format)
 
**Generated output**:
- `.cursor/local/docs/PRs/` - PR documentation
- `.cursor/local/docs/projects/` - Feature documentation

The system is built on progressive disclosure: each agent reads only its own guide (143–283 lines), reducing the required context by 71–86% compared to monolithic systems. Agents coordinate through the shared plan file in Markdown format, where each agent updates only its own tasks, avoiding state conflicts.

Specifications Become Your Product

Photo by Pixabay on Pexels.com

You’ll spend significantly more time on specifications. And honestly? I’m fully convinced that’s a feature, not a bug.

Working this way forces you to truly understand the problem. You talk to stakeholders, anticipate edge cases, think through how you want things to work, and you write it all down. These specifications become the input to the process, but they’re also excellent project documentation that can be discussed with stakeholders and will outlive the implementation.

Specifications focus on the what, not the how. What do we want to achieve? What functional or design constraints exist? What data sources should we use? The agents figure out the how.

What Good Specs Should Cover

Your specifications should address (directly or by referencing existing .md files in the repo):

Functional requirements: What must the agent do?
Input and output data: Schemas, sources, privacy constraints if relevant, and expected outputs
Non-functional requirements: Security, privacy/PII handling, access, reliability, observability, other constraints
Domain and business rules: What’s allowed and what’s forbidden in the problem space
Coding and interaction style: Referencing existing style guides

Some parts of these specifications are defined for the entire project and shouldn’t be duplicated. Instead, the agent must know where to find the relevant Markdown files.

Specs Force You to Study Your Domain

This approach requires you to study your data sources in depth—their content, edge cases, inconsistencies, potential duplicates. I use this opportunity to clean up and document tables, columns, or schemas, helping mid-term data discoverability for both humans and AI models. This is the kind of side benefit that doesn’t show up in productivity metrics but compounds over time.

The Dialogue Before the Plan

Before the orchestrator splits work into atomic tasks, there’s a crucial step: discussing the approach and high-level design decisions.

I encourage this dialogue by including alternatives in my specifications and explicitly asking the orchestrator for different approaches—even when I’ve already decided mine—just to make sure I’m not missing something interesting. The agent definition also includes encouragement to ask for clarifications or suggest alternative approaches.

So both sides (me and the orchestrator) are determined to start a dialogue where I have the final say, but it’s worth challenging your own assumptions. The orchestrator sometimes suggests approaches that genuinely surprise me—things I wouldn’t have considered on my own.

Only after this conversation does the planner divide the work into atomic tasks to be carried out by the developer, the tester, and the documenter.

Context Management: Less Is More

Each agent handles much less context than a single agent would. The developer doesn’t need to know testing details. The tester doesn’t need to understand every implementation decision. This separation of concerns isn’t just organizational: it prunes the context and focuses each agent’s attention.

Complementarily, I use progressive disclosure as much as possible: agents receive information only when they need it, not everything upfront. Think of it as lazy loading for context. The planner knows the big picture; workers get just what they need for their specific task as/if they need it.

And when an agent doesn’t do what I want? I don’t fix the code—I fix the markdown files or context and ask the agent to try again. The correction becomes memory, reinforcing the agent and transcending what would be a one-off patch.

Pro tip: Instead of adding the context yourself, try explaining to the agent what went wrong and ask it to complete one of the context files so that it doesn’t happen again. This way, the agent learns and the knowledge persists.

Preview Mode: The Safety Net

The system supports two modes:

Auto-execution: Agents complete tasks autonomously
Preview mode: Agents make changes but wait for your approval before marking tasks complete

Auto-execution mode works well, but I always use preview mode. I prefer checking changes in Cursor’s UI before agents mark tasks complete. It’s the difference between “I believe in AI” and “I take ownership of my code.”

Looking for errors in the PR is (1) too late—it creates friction and avoidable iterations, and more importantly, (2) I appropriate my code and assume responsibility for the robustness of my work. Sending code to a PR review without full understanding would create alienation and exacerbate the gap between what I fully understand and what I’m doing.

(Small clarification: “What I fully understand” is not the same as “What I’m able to fully understand.” Important distinction.)

Photo by ThisIsEngineering on Pexels.com

The Correction Tax

Boris Cherny puts it well: “The bottleneck in modern AI development isn’t the generation speed of the token; it is the human time spent correcting the AI’s mistakes.“. In my opinion the workflow should invest time upfront—in specifications, in approach discussions, in context management—to reduce correction time later. It’s not about making AI faster; it’s about making the human-AI collaboration more reliable.

Pay the tax for a smarter model upfront, and you eliminate the “correction tax” later.

Reality Check: What Actually Happens

Let me be honest about the practical realities.

Speed Gains Are Real, But Not Magical

This workflow is significantly faster than doing everything manually, but it’s not as fast or as simple as just pasting a (linear/github/others) issue into an AI chat. In my experience, that simpler approach can work acceptably well for simple, atomic, decision-free tasks.

For complex problems requiring judgment calls, the multi-agent setup shines. It’s about matching the tool to the task.

Cognitive Load Is Brutal

You can work through 2 or even 3 complex problems a day, but you’ll end up exhausted. Your brain has to constantly think and validate. Sometimes it feels like you don’t have time to digest and incorporate what you’re doing.

This is the hidden cost nobody talks about. The AI is fast, but your brain is still the bottleneck.

Review Everything

Yes, you could trust the agents and let them run. But I prefer reviewing each step and validating as I go. On one hand, you understand the work better; on the other hand, it allows you to catch problems before the snowball gets bigger.

Sometimes you find yourself asking the agent to refactor for clarity, performance, or elegance. Sometimes you pause to reconsider part of your approach—or even the entire thing—and carry out changes from a given point in the plan.

Agents Can Be Unnecessarily Verbose

Some explanations or recaps of what the agent does are incredibly prolix—meaning lots of wasted tokens. I’m pondering adding something along the lines of “don’t give me explanations unless requested” to my .md files. The documenter already creates a recap file, so having the agent also explain everything verbally is redundant.

Sometimes You Need Flexibility

In some cases, you need an agent’s participation in a task that wasn’t planned. For instance, if the test results show you need changes in the code, the tester isn’t allowed to modify it—so it might be tempted to “fix it” by changing the test instead.

But because it asks you first (in preview mode), you can opt for asking the tester to exceptionally modify the code, or choose any other option. The human stays in the loop.

Token Costs Add Up

Multi-agent decomposition is a productivity/reliability tool, not a cost-saving one. The approach is more expensive in terms of tokens because:

The parent agent reads and processes the full task file
Parent agent context includes system prompts plus task decomposition overhead
Each subagent gets its own full system context (rules, prompts, etc.)
Multiple agent contexts run in parallel or sequence
Task decomposition and coordination add overhead

So you need to be mindful of token costs and intentional with context-saving techniques like progressive disclosure.

What This Means for the Future of Development

I’ve been thinking a lot about what this workflow implies for how we’ll work in the coming years.

The Specification Becomes the Product

When AI can handle the implementation, the specification becomes the most valuable artifact. The person who can clearly articulate what needs to be built—with all the edge cases, constraints, and domain knowledge—becomes the bottleneck, not the person who can type code fastest.

This is a fundamental shift. It rewards deep domain expertise, clear thinking, and communication skills. It’s less about knowing syntax and more about knowing the problem space.

Ownership Still Matters

Even with AI doing most of the typing, I still feel ownership over the code. Because I reviewed every change. Because I understood every decision. Because I could explain every line if asked.

This matters for professional integrity, but it also matters for practical reasons. When something breaks, you need to understand the system. When a stakeholder asks why something works a certain way, you need to have the answer. AI can help you build faster, but it can’t take responsibility for you.

The Human Stays in the Loop

At least for now, and probably for a while longer, the human needs to stay in the loop. Not because AI can’t do the work—it often can—but because someone needs to make judgment calls, catch subtle errors, and take responsibility for the outcome.

The multi-agent workflow doesn’t remove the human. It changes what the human does. Less typing, more thinking. Less debugging syntax, more evaluating approaches. Less grunt work, more judgment work.

Whether that’s better or worse probably depends on what you enjoy about the job.

Final Thoughts

Running multiple AI agents in parallel is not magic. It’s not effortless. It’s not a replacement for thinking. But it is genuinely useful. It does speed up complex work by letting you focus on the what and the why, while delegating the mechanical parts to agents that are quite good at them.

The key insight for me has been this: the workflow doesn’t make me work less. It makes me work differently. Less typing, more thinking. Less debugging, more designing. For me, it’s been a net positive—even with the cognitive exhaustion, even with the token costs, even with the learning curve.

If you’re curious, I’d encourage you to experiment. Start small. See how it feels. Iterate on what works and discard what doesn’t. And then, remember…

Sharing is lovin’

Spread the word

JOIN us!

Fancy getting RemoteFrog updates? - ¿Quieres estar al día de lo que pasa en RemoteFrog?

Remote Frog

Lessons Learned from Running Multiple AI Agents in Parallel

What Actually Changed in My Workflow

Table of Contents

The Architecture: Five Agents, Three Chats

Why Parallel Chats Instead of Subagents?

Markdown as the Coordination Layer

The File Structure

Specifications Become Your Product

What Good Specs Should Cover

Specs Force You to Study Your Domain

The Dialogue Before the Plan

Context Management: Less Is More

Preview Mode: The Safety Net

The Correction Tax

Reality Check: What Actually Happens

Speed Gains Are Real, But Not Magical

Cognitive Load Is Brutal

Review Everything

Agents Can Be Unnecessarily Verbose

Sometimes You Need Flexibility

Token Costs Add Up

What This Means for the Future of Development

The Specification Becomes the Product

Ownership Still Matters

The Human Stays in the Loop

Final Thoughts

Like this:

Spread the word

Leave a ReplyCancel reply

JOIN us!

Lessons Learned from Running Multiple AI Agents in Parallel

What Actually Changed in My Workflow

Table of Contents

The Architecture: Five Agents, Three Chats

Why Parallel Chats Instead of Subagents?

Markdown as the Coordination Layer

The File Structure

Specifications Become Your Product

What Good Specs Should Cover

Specs Force You to Study Your Domain

The Dialogue Before the Plan

Context Management: Less Is More

Preview Mode: The Safety Net

The Correction Tax

Reality Check: What Actually Happens

Speed Gains Are Real, But Not Magical

Cognitive Load Is Brutal

Review Everything

Agents Can Be Unnecessarily Verbose

Sometimes You Need Flexibility

Token Costs Add Up

What This Means for the Future of Development

The Specification Becomes the Product

Ownership Still Matters

The Human Stays in the Loop

Final Thoughts

Like this:

Spread the word

Leave a ReplyCancel reply

JOIN us!

Discover more from Remote Frog