Blog · Developer Tools · 16 min read

Claude Code Dynamic Workflows Ship in Research Preview

By Fabio Douek

Jump to section
Explain (TLDR) like I am...
What is this?

Imagine your robot helper kept a secret toolbox in its backpack the whole time. The toolbox has six little projects that each split work into a team of small robot helpers who do different jobs at once.

The toolbox isn't a secret anymore. Now you just say a magic word, like "workflow", and the robot opens it for everyone. The little helpers fan out, each does its job, and they check each other's work before telling you the answer.

Treat this as a vendor capability now documented in the product but formally a research preview, not a supported release. The capability is a workflow orchestration engine introduced in Claude Code 2.1.154, on by default on paid plans (Pro accounts toggle it in /config) and no longer gated behind the old preview flag.

Material implications: the engine is deterministic by design and persists scripts and run journals to the session directory, which creates an audit-friendly artifact trail. Because it is preview, the surface API may still evolve between releases; version-pin if you depend on specifics, and treat behavior as subject to change.

Think of this as a treatment available through an early-access program rather than one with full approval. The mechanism is sound, the side-effect profile is engineered conservatively, and the evidence base is growing as it moves from the code itself into real field use.

It's available on current paid-plan builds, so the cohort is no longer tiny, but it is still labeled research preview. The dosage to watch is token consumption: a single run can fan out widely, so monitor cost on the heavier workflows rather than the feature's safety.

Notice the shape of something that was held back and is now out in the open. The engineering always looked cared-for: deterministic execution, resumable runs, explicit permission gates, named workflows for the work the team already does most often.

The friction used to be the silence: a changelog line that appeared and disappeared, a documented flag that did nothing. That gap has narrowed. The distance between "we built it" and "you can try it" is smaller now; it shipped as a research preview, and it's documented.

Think of this as a session rig the band rehearsed in private and has now taken on tour. The primitives are familiar: a part-leader who fans out cues, sections that play in parallel, a pipeline that streams without waiting for everyone to finish each bar. The conductor is JavaScript, not the singer.

The doors are open now, the record is out. The rig works, the charts are written, the six standards are arranged, and you can finally play along instead of reading the score from leaked liner notes.

The story is a launch. Anthropic shipped a complete multi-agent orchestration engine inside Claude Code as a research preview, with documentation for the feature itself. The build quality is high, six bundled workflows ship out of the box, and the script API is solid enough to author your own against today, even though that API is not yet publicly documented.

For positioning, this has moved from leading-indicator to available-now. Readers don't just get a heads-up on a future primitive: they can use it today. Pair it with a "what to build first" wrap-up so the piece travels as a practical on-ramp.

Claude Code: Dynamic Workflows (Research Preview)

Overview

Yesterday I posted a blog about Inside Claude Code’s Hidden Deterministic Workflow Tool. The feature shipped about 12 hours later, in research preview. Things are moving fast and keeping me busy, so here’s an up-to-date version.

One note on naming: I previously called this “Deterministic Workflow”, while Anthropic is calling it “Dynamic Workflows”. It’s all a matter of perspective. From mine, the workflow is deterministic in the sense that the flow is wrapped in JS code rather than being generated by the LLM. Either way, from now on I’ll call it Dynamic Workflows to avoid confusion.

A lot of what you ask Claude Code to do follows the same shape: gather requirements, draft a plan, write the code, run the tests, do a round of validation, open the PR. The steps don’t change much from one task to the next; the order is deterministic even when the work inside each step isn’t.

You can ask a Skill to handle that flow, but a Skill is a prompt: it can describe the steps, not enforce them. When you want strict ordering or branching, you end up gluing Skills together with hooks. It works, but it gets brittle fast.

That’s the gap a native Workflow fills. A Workflow in Claude Code is a small .js file that defines the flow itself (when to fan out agents, when to gate on a result, when to move to the next phase), while the LLM handles each agent’s job inside the steps the script decides.

Dynamic Workflows shipped in research preview in Claude Code 2.1.154 (I’m using 2.1.156 for this post), and they’re now officially documented at code.claude.com/docs/en/workflows. The build bundles six Dynamic Workflows, though only deep-research surfaces in a normal local session; the other five sit behind an internal remote-execution gate (more on that below). This post walks through what those six workflows do and how to write your own.

Enabling Workflows

Workflows are on by default on paid plans, so for most accounts you just upgrade to a current build and /workflows is there. Pro accounts toggle it in /config, and you can disable it everywhere with CLAUDE_CODE_DISABLE_WORKFLOWS=1. The old CLAUDE_CODE_WORKFLOWS preview flag is gone.

Implementation detail: In a normal local session only deep-research shows up, and it’s the one bundled workflow Anthropic documents. The other five (autopilot, bugfix, dashboard, docs, investigate) register only when CLAUDE_CODE_REMOTE is set, which is an internal remote-execution signal rather than a user setting. They open PRs and expect a remote, PR-capable environment, so flipping the variable locally won’t hand you a working /autopilot. Your own workflows in .claude/workflows/ are unaffected by this gate.

The Workflow tool itself stays dormant until you opt into multi-agent orchestration for a given task, since a run can fan out across many agents and burn a lot of tokens. Short of that, it leaves the tool alone, so day-to-day use isn’t affected until you actually want a workflow. The exact opt-in triggers are in the next section.

The Workflow Tool

Workflow is the model-invocable tool that runs a workflow script. It’s gated behind explicit user opt-in (the model only calls it when you include the word “workflow”, turn on Ultracode, invoke a workflow slash command, or directly ask for multi-agent orchestration) and runs in the background: the tool returns a task ID immediately, then a notification fires when the script finishes.

Input schema:

ParameterDescription
scriptSelf-contained workflow JS. Must begin with export const meta = { name, description, phases } (pure literal, no computed values) followed by the script body.
namePredefined workflow name (built-in or from .claude/workflows/). Resolves to a self-contained script.
argsOptional input value exposed to the script as the global args.
scriptPathPath to a workflow script file on disk. Takes precedence over script and name.
resumeFromRunIdRun ID of a prior invocation. Completed agent() calls with unchanged (prompt, opts) return cached results; the first edited or new call and everything after it runs live. Same session only.

The runtime enforces determinism: Date.now, Math.random, and argless new Date() are unavailable inside a script and throw if called, so resumes can replay cached agent() results byte-for-byte. (The loader also parses each script’s AST and requires export const meta = { … } to be the first statement.)

The Script API

The script runs in a separate context with only these globals available:

agent

agent(prompt: string, opts?: {
  label?: string,
  phase?: string,
  schema?: object,          // JSON Schema -> forces StructuredOutput tool call
  model?: string,           // haiku | sonnet | opus | full ID
  isolation?: 'worktree',   // fresh git worktree per agent
  agentType?: string,       // 'Explore', 'code-reviewer', etc.
}): Promise<any>

Spawns a subagent. Returns text by default, a validated object when schema is set, or null if the user skips the agent mid-run.

parallel

parallel(thunks: Array<() => Promise<any>>): Promise<any[]>

Runs the thunks concurrently and awaits all of them. A thunk that throws resolves to null in the result array; the call itself never rejects.

pipeline

pipeline(items, stage1, stage2, ...): Promise<any[]>

Runs each item through all stages independently with no barrier between stages: item A can be in stage 3 while item B is still in stage 1. Each stage receives (prevResult, originalItem, index).

Canonical pattern:

export const meta = {
  name: 'review-changes',
  description: 'Review changed files across dimensions, verify each finding',
  phases: [{ title: 'Review' }, { title: 'Verify' }],
}

const DIMENSIONS = [{key: 'bugs', prompt: '...'}, {key: 'perf', prompt: '...'}]

const results = await pipeline(
  DIMENSIONS,
  d => agent(d.prompt, {label: `review:${d.key}`, phase: 'Review', schema: FINDINGS_SCHEMA}),
  review => parallel(review.findings.map(f => () =>
    agent(`Adversarially verify: ${f.title}`, {label: `verify:${f.file}`, phase: 'Verify', schema: VERDICT_SCHEMA})
      .then(v => ({...f, verdict: v}))
  ))
)
const confirmed = results.flat().filter(Boolean).filter(f => f.verdict?.isReal)
return { confirmed }
// Dimension 'bugs' findings verify while dimension 'perf' is still reviewing.

phase, log, workflow, args, budget

  • phase(title): groups subsequent agent() calls under a title in the progress display.
  • log(message): emits a progress line above the agent tree in the /workflows TUI.
  • workflow(nameOrRef, args?): calls another workflow inline as a sub-step; one level of nesting only.
  • args: the value passed as Workflow’s args input.
  • budget: hard token ceiling tied to the user’s +500k-style directive:
budget: {
  total: number | null,
  spent(): number,
  remaining(): number,
}

Loop-until-budget pattern:

const bugs = []
while (budget.total && budget.remaining() > 50_000) {
  const result = await agent("Find bugs in this codebase.", {schema: BUGS_SCHEMA})
  bugs.push(...result.bugs)
  log(`${bugs.length} found, ${Math.round(budget.remaining()/1000)}k remaining`)
}

Define Your Own Workflow

To make this concrete, here’s branch-summary, a tiny workflow that reads the current branch’s diff and produces a polished one-paragraph summary plus a PR-title-shaped headline. Three phases, a linear agent() ladder.

PhaseDescription
DiffFind the diff base, list the changed files, capture a rough summary from commit messages
SummarizeRead the changed files and turn the rough summary into a focused paragraph
PolishTighten the paragraph and add a one-line headline

Drop the file in one of these locations:

  • Project-level: .claude/workflows/branch-summary.js
  • User-level: ~/.claude/workflows/branch-summary.js

The scanner is a flat *.js glob, no subdirectories. Restart Claude Code; the workflow registers under its meta.name (not the filename), so this one is invokable as /branch-summary or Workflow({ name: 'branch-summary' }).

Workflows run in the background. To check on a run, type /workflows, which opens a TUI listing each phase. Drill into a phase to see the per-agent execution details.

/workflows TUI showing the three phases of the branch-summary run

Full source: branch-summary.js (88 lines)
// branch-summary: the smallest useful 3-phase workflow.
//
// Drop in `.claude/workflows/branch-summary.js` and run via
// `Workflow({ name: 'branch-summary' })`. Reads the current branch diff and
// produces a polished one-paragraph summary plus a headline.
//
// Uses only the three primitives a linear workflow needs: phase(), agent(),
// log(). No parallel, no pipeline. No external MCP or GitHub CLI. The diff
// step uses git via Bash, which the workflow-subagent already has.

export const meta = {
  name: 'branch-summary',
  description: 'Linear 3-phase summary of the current branch: diff, summarize, polish. Returns a headline and a 3-5 sentence paragraph.',
  whenToUse: 'When the user wants a quick written summary of what changed on the current branch, for PR descriptions, stand-ups, or sharing with a teammate. Produces a report, not a PR.',
  phases: [
    { title: 'Diff', detail: 'Find the diff base, list changed files, capture a rough summary from commit messages' },
    { title: 'Summarize', detail: 'Read the changed files and turn the rough summary into a focused paragraph' },
    { title: 'Polish', detail: 'Tighten the paragraph and add a one-line headline' },
  ],
}

// ═══ Schemas ═══
// Kept deliberately light. Production-shaped workflows (see autopilot, deep-research)
// validate far more. For a demo, two object shapes is enough.
const DIFF_SCHEMA = {
  type: 'object',
  required: ['diffBase', 'files', 'rawSummary'],
  properties: {
    diffBase: { type: 'string', description: 'Branch this was diffed against, e.g. origin/main' },
    files: { type: 'array', items: { type: 'string' } },
    rawSummary: { type: 'string', description: 'One paragraph drawn from commit messages and the stat output' },
  },
}

const RESULT_SCHEMA = {
  type: 'object',
  required: ['summary'],
  properties: {
    headline: { type: 'string', description: 'One short line, sentence-case, suitable as a PR title' },
    summary: { type: 'string', description: '3-5 sentence paragraph for a teammate skim-reading the PR' },
  },
}

// ═══ Phase 1: Diff ═══
phase('Diff')
const diff = await agent(
  "Discover the scope of changes on the current branch.\n\n" +
  "1. Diff base: run `git merge-base HEAD origin/main`. If origin/main does not exist, try `main`. Return whichever resolves.\n" +
  "2. Changed files: `git diff --name-only <diffBase>...HEAD`\n" +
  "3. Rough summary: skim `git log --oneline <diffBase>...HEAD` and `git diff --stat <diffBase>...HEAD`. Write one paragraph capturing the gist from the commit messages.\n" +
  "Structured output only.",
  { label: 'diff', schema: DIFF_SCHEMA }
)
if (!diff) return { error: 'Diff step skipped.' }
if (diff.files.length === 0) {
  return { summary: `No changes on this branch vs ${diff.diffBase}.`, diffBase: diff.diffBase, filesChanged: 0 }
}
log(`${diff.files.length} files changed vs ${diff.diffBase}`)

// ═══ Phase 2: Summarize ═══
phase('Summarize')
const summary = await agent(
  "Read the changed files and turn the rough summary into a focused paragraph for a teammate skim-reading the PR.\n\n" +
  `Diff base: ${diff.diffBase}\n` +
  `Files (${diff.files.length}): ${diff.files.join(', ')}\n` +
  `Rough summary: ${diff.rawSummary}\n\n` +
  "Cover: what changed, why (if clear from commits), risk areas worth a second look. 3-5 sentences. Use concrete file paths and function names, no vague language like 'updates' or 'improvements'.",
  { label: 'summarize', schema: RESULT_SCHEMA }
)
if (!summary) return { error: 'Summarize step skipped.', diff }

// ═══ Phase 3: Polish ═══
phase('Polish')
const polished = await agent(
  "Tighten this branch summary. Same content, less filler.\n\n" +
  `Summary: ${summary.summary}\n\n` +
  "Drop hedge words ('seems to', 'might'), kill obvious sentences ('This PR changes some files'), keep the file/function names. " +
  "Add a one-line headline (sentence case, no trailing period, under 70 chars) suitable as a PR title.",
  { label: 'polish', schema: RESULT_SCHEMA }
)

return {
  headline: polished?.headline ?? summary.headline ?? null,
  summary: polished?.summary ?? summary.summary,
  diffBase: diff.diffBase,
  filesChanged: diff.files.length,
}

Bundled Workflows in Claude Code

Claude Code bundles six workflows out of the box. Five of them (autopilot, bugfix, dashboard, docs, and investigate) register only when CLAUDE_CODE_REMOTE is set. Only deep-research registers unconditionally. The split isn’t strictly about who opens a PR: investigate produces a report and still sits behind the remote gate, while deep-research, also a report, is the one workflow that’s always available.

autopilot

Description. An end-to-end task runner. Builds a plan with a 5-angle adversarial critique, adjusts the plan, implements, uses a bughunt-lite review + feature completeness check, fixes issues, then opens a PR.

When to use. When the user gives a self-contained coding task they want completed end-to-end without supervision. Best for long-running tasks that require some or large amounts of planning and verification. This workflow scopes the problem, hardens its plan using 5 critics, implements it, runs a bug hunting sweep and a feature completeness check, fixes issues, and then opens a PR.

Phases.

  • Plan: Scope + draft, 5 critics (scope/simplicity/reuse/verification/correctness), harden
  • Implement: Single agent executes the hardened plan
  • Review: 3 rapid + 2 deep finders, 5-vote pigeonhole verify, + completeness vs task
  • Fix: Address confirmed issues (skipped if clean)
  • PR: Lint, typecheck, open PR, subscribe to auto-fix

Techniques. 5-angle plan critique; embeds bughunt-lite for review; skip-if-clean fix phase.

Output. PR with auto-fix subscription. Requires CLAUDE_CODE_REMOTE.

bugfix

Description. Reproduce-first bug fixer. Writes a failing repro, root-causes the fault, applies the minimal fix, converts the repro into a regression test, then opens a PR.

When to use. When the user reports a specific bug to fix. Best when the bug is concrete enough to reproduce. This workflow writes a failing repro first, traces the root cause, applies the smallest fix that makes the repro pass, locks it in as a regression test, and opens a PR.

Phases.

  • Reproduce: Write a failing script or test that demonstrates the bug
  • Root-cause: Trace the fault, grep callers, identify the minimal culprit
  • Fix: Smallest diff that makes the repro pass
  • Regress: Convert repro into a permanent test, run the touched suite
  • PR: Lint, typecheck, open PR

Techniques. Failing-repro-first ordering; regression-test conversion; minimal-diff bias.

Output. PR with new regression test. Requires CLAUDE_CODE_REMOTE.

dashboard

Description. Dashboard generator. Discovers data sources and existing dashboard conventions in the repo, designs a panel layout, implements it, dry-runs queries and render-checks the result, then opens a PR.

When to use. When the user wants a dashboard, monitoring view, or metrics page built. This workflow finds the available data and existing dashboard patterns, specs out panels and layout, implements them, validates queries and rendering, and opens a PR.

Phases.

  • Discover: Data sources, existing dashboard libs/patterns in repo
  • Design: Panels, metrics, layout spec
  • Implement: Build the dashboard
  • Verify: Query dry-run, render/screenshot if possible
  • PR: Open PR

Techniques. Convention discovery before design; render-check gate before PR.

Output. PR. Requires CLAUDE_CODE_REMOTE.

docs

Description. Documentation generator. Discovers the feature surface and existing doc conventions, outlines for the target audience, writes or updates the docs, verifies code examples and links, then opens a PR.

When to use. When the user wants documentation written or updated for a feature, API, or module. This workflow finds the relevant code and existing doc patterns, drafts an outline, writes the content, checks that examples run and links resolve, and opens a PR.

Phases.

  • Discover: Feature surface, existing docs, location conventions
  • Outline: Structure and audience
  • Write: Create or update doc files
  • Verify: Examples compile/run, links resolve, accuracy vs code
  • PR: Open PR

Techniques. Outline-then-write; runs examples and link-checks in Verify.

Output. PR. Requires CLAUDE_CODE_REMOTE.

investigate

Description. Root-cause investigation. Gathers evidence, generates competing hypotheses in parallel, adversarially refutes each one, and produces a written root-cause report with a suggested fix.

When to use. When the user wants the root cause of an incident, error, log, trace, or puzzling behavior found, without necessarily fixing it. This workflow collects evidence, runs parallel hypothesis agents, tries to refute each hypothesis, and writes up the surviving root cause with next steps. It produces a report, not a PR.

Phases.

  • Gather: Logs, traces, repro, timeline
  • Hypothesize: 3 parallel hypothesis agents
  • Verify: One adversarial refuter per hypothesis
  • Report: Root-cause writeup, suggested fix, next steps

Techniques. Parallel hypothesis generation; per-hypothesis adversarial refuter; survivor-takes-all.

Output. Report (no PR). Requires CLAUDE_CODE_REMOTE, grouped with the PR workflows even though it never commits.

deep-research

Description. Deep research harness: fan-out web searches, fetch sources, adversarially verify claims, synthesize a cited report.

When to use. When the user wants a deep, multi-source, fact-checked research report on any topic. BEFORE invoking, check if the question is specific enough to research directly. If underspecified (e.g., “what car to buy” without budget/use-case/region), ask 2-3 clarifying questions to narrow scope. Then pass the refined question as args, weaving the answers in.

Phases.

  • Scope: Decompose question (from args) into 5 search angles
  • Search: 5 parallel WebSearch agents, one per angle
  • Fetch: URL-dedup, fetch top 15 sources, extract falsifiable claims
  • Verify: 3-vote adversarial verification per claim (need 2/3 refutes to kill)
  • Synthesize: Merge semantic dupes, rank by confidence, cite sources

Techniques. Multi-angle decomposition; URL dedup with top-N cap; claim-level adversarial verification; confidence-ranked citation.

Output. Cited research report. Registers unconditionally.

The deep-research workflow's Verify phase in the /workflows TUI, fanning out 75 agents for per-claim adversarial verification

The pattern repeats across all six: fan out, verify adversarially, synthesize. The verification stages typically use 3 to 5 votes. autopilot’s Review phase adds “pigeonhole early-exit”: with a 5-vote verify, if two voters refute a finding the remaining three are skipped, because no majority is possible. deep-research pushes verification down to the level of individual claims: every extracted claim gets its own 3-vote refutation pass before it makes the final report.

Conclusion

Embedding a dynamic workflow engine directly into Claude Code is an interesting move from Anthropic. It pushes the orchestration layer out of the model and into code you can read, version, and reason about, which is exactly the right place for “what fans out, what verifies, what synthesizes” to live.

I evaluated some of the bundled workflows. They do the job. The thing to watch is token consumption: the harness pattern is the whole point of workflows, but some of the current implementations are aggressive enough that they’d benefit from being tuned or giving end users a knob.

Now that it’s shipped, even in preview, I’m looking forward to seeing what people build with it. I also hope workflows end up bundle-able as part of Claude plugins, so a well-crafted harness can be distributed and installed as easily as a skill is today.

Comments