Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Release Runbook

Interim manual process. This runbook covers how to ship a stable release today using release-stable-manual.yml. It remains the live process until release-plz lands and replaces it; that migration has not happened yet (see Where this is going).

If anything in here feels heavyweight, that is intentional friction, we do not yet have the automation discipline to remove it safely.

Last verified against the 0.8.0-beta cycle.


The process in seven steps

  1. Generate CHANGELOG-next.md using the changelog skill
  2. Open and merge a version bump PR
  3. Dry-run the release workflows locally with act
  4. Trigger the Release Stable workflow via manual dispatch
  5. Approve the two environment gates when prompted
  6. Verify the release exists and assets are downloadable
  7. Versioned documentation deployment

That is the entire process. Everything else (Docker, website redeploy, Scoop, AUR, Homebrew, Discord, tweet) runs automatically as downstream jobs. You do not need to do anything for those unless a job explicitly fails.


Step 1: Generate CHANGELOG-next.md

Run the changelog-generation skill to produce CHANGELOG-next.md. Its full procedure lives at .claude/skills/changelog-generation/SKILL.md.

The skill generates the changelog from the git log between the last stable tag and HEAD, resolves contributors via GitHub GraphQL, and writes the file. Commit the result directly to a short-lived branch and include it in the version bump PR (step 2), or open it as a separate preceding PR if the diff is large.

If CHANGELOG-next.md already exists from a previous aborted release cycle, review it for accuracy before reusing it.


Step 2: Bump and merge the version PR

Bump workspace.package.version in the workspace Cargo.toml, then sync all other version references:

sh

bash scripts/release/bump-version.sh X.Y.Z

This updates README badges, the Tauri config, and workflow description examples. Commit everything together:

chore: bump version to vX.Y.Z

Open a PR. Label it chore, size: XS. Get one maintainer review. Merge when CI is green.

Confirm the merge landed correctly:

sh

git fetch origin
git show origin/master:Cargo.toml | grep '^version'
# Must show: version = "X.Y.Z"

Step 3: Dry-run the release workflows locally with act

The Release Stable workflow is a GitHub Actions job graph that consumes your environment-gate approval window the moment you click Run workflow. If a workflow step is broken: a missing build artifact, a stale path, a codegen step that someone removed without updating CI, the failure surfaces after you have committed to a release window, with the version PR already merged and master at the new version. Recovery means landing an emergency fix branch, re-running CI, and shipping under time pressure on a tree that already advertises itself as a fully-released version.

The cheap insurance against this is to run the same job graph locally first, on the exact merged master commit, before opening the GitHub Actions form. act executes GitHub Actions workflows inside Docker containers using the same actions/* ecosystem GitHub does. It does not perfectly mirror the cloud runner; it cannot reach the artifact upload runtime, GitHub-issued OIDC tokens, environment secrets, or jobs that depend on a real release tag, but it does run the build and test steps that account for nearly every release-time CI failure we have ever hit.

This step is a 15–20 minute investment per release. It has caught real defects that the regular per-PR CI did not surface (because the failing workflow only runs on workflow_dispatch, not on push).

One-time setup

act runs the workflows. The cleanest install path is the GitHub CLI extension, because it inherits your gh authentication and exposes a real GITHUB_TOKEN to every workflow run:

  1. Install the GitHub CLI from https://cli.github.com (Linux, macOS, Windows). Authenticate once: gh auth login.

  2. Install the act extension:

    sh

    gh extension install nektos/gh-act
    
  3. Install Docker Engine or Docker Desktop from https://docs.docker.com/engine/install/. On Linux, add yourself to the docker group so you don’t need sudo. act also works with Podman and Colima; see the act runners documentation.

That’s the whole setup. The repository’s .actrc and scripts/dev/act-local.sh handle everything else (runner image, secrets file, artifact server, action SHA pre-fetching).

Per-release dry-run

Make sure your working tree matches the merged master tip from step 2:

sh

git fetch upstream
git checkout upstream/master

List what’s runnable across every workflow file:

sh

./scripts/dev/act-local.sh --list

Run a specific job, pick interactively, or run every dry-run-safe job:

sh

./scripts/dev/act-local.sh release-stable-manual:web   # one job
./scripts/dev/act-local.sh                              # interactive picker
./scripts/dev/act-local.sh --all                        # every dry-run-safe job

The first run pulls the runner image (~1.5 GB) and primes the Rust build cache via Swatinem/rust-cache; subsequent runs are much faster. The script auto-creates the gitignored .secrets file, pre-fetches every pinned action SHA into ~/.cache/act/ (act’s shallow clone can’t resolve arbitrary commits otherwise), threads GITHUB_TOKEN from your gh auth into the run via the parent process environment (the token value never lands in argv), and sets --artifact-server-path so actions/upload-artifact and actions/download-artifact work between jobs. All of that is plain act underneath; the script just removes the flag soup.

--all only runs jobs on a dry-run-safe allowlist

act does not honor GitHub’s environment-protection gates. With the maintainer’s real GITHUB_TOKEN threaded into the run, a successful local invocation of a job that writes to GitHub (a publish that calls gh release create, a docker job that pushes to GHCR, a docs-deploy that force-pushes gh-pages, a daily-audit that opens an issue, a tweet-release or discord-release that posts to a webhook) could perform the real-world side effect on first try.

--all therefore enforces a hardcoded allowlist of jobs proven safe to run locally; currently the artifact-only build steps in release-stable-manual.yml and cross-platform-build-manual.yml (validate, web, release-notes, build, build-desktop). Everything else is skipped with a logged reason:

==> skip release-stable-manual:publish (not on dry-run-safe allowlist)
==> skip release-stable-manual:docker (not on dry-run-safe allowlist)
==> skip release-stable-manual:redeploy-website (not on dry-run-safe allowlist)
==> skip docs-deploy:deploy (not on dry-run-safe allowlist)
==> skip daily-audit:advisories (not on dry-run-safe allowlist)
==> skip tweet-release:tweet (not on dry-run-safe allowlist)

The allowlist is fail-closed: a new workflow added to the repo is treated as potentially mutating until a maintainer reviews it and adds the safe job IDs to DRY_RUN_SAFE_JOBS in scripts/dev/act-local.sh. This matters because discover_jobs walks every .github/workflows/*.yml, not just the release workflows, a denylist would silently let a future write-surface workflow through.

Two escape hatches exist for the rare case where you have a reason to attempt a non-allowlisted job locally:

  • ./scripts/dev/act-local.sh release-stable-manual:publish: the explicit <wf>:<job> form runs what you ask for and prints a loud warning before invoking act if the target isn’t on the allowlist.
  • ./scripts/dev/act-local.sh --all --no-allowlist: disables the allowlist filter for an entire --all run (used only when you’ve already verified the workflow steps will not reach a mutation surface, e.g. on a fork with no real registry credentials and an empty .secrets file).

What’s expected to fail under act (and is fine)

act cannot simulate a few GitHub-only surfaces. These failures are not real defects:

  • Jobs that depend on a real release tag (publish creating a GitHub Release).
  • Environment-gated jobs (publish, docker): the approval UI doesn’t exist locally.
  • OIDC-based federated identity tokens.

Everything else, a tsc error, a missing file, a Rust compile failure, a cargo lockfile mismatch, is a real defect. Do not click Run workflow on the GitHub Actions form until those are fixed via a standard PR off master.


Step 4: Trigger the release

Go to:

https://github.com/zeroclaw-labs/zeroclaw/actions/workflows/release-stable-manual.yml

Click Run workflow. Fill in:

  • Branch: master
  • Stable version to release: X.Y.Z, no v prefix

Click Run workflow.

The first job (validate) checks that the version matches Cargo.toml and that no tag vX.Y.Z already exists. If it fails, fix the mismatch and re-trigger. Do not try to work around it.


Step 5: Approve the environment gates

Two jobs are gated by GitHub environment protection rules. When each becomes pending you will see a “Waiting for review” banner in the workflow run.

Approve both when they appear:

EnvironmentJobWhat it does
github-releasespublishCreates the GitHub Release and uploads assets
dockerdockerPushes images to GHCR

If you miss the approval window and a job times out, re-run only the failed job from the workflow run page; you do not need to restart from scratch.


Step 6: Verify the release

Once publish completes, confirm:

[ ] GitHub Release exists at /releases/tag/vX.Y.Z and is marked Latest
[ ] Release notes are non-empty
[ ] SHA256SUMS asset is present and non-empty
[ ] At least one binary archive is downloadable (spot-check linux x86_64)

CHANGELOG-next.md is intentionally left on master after the release: the publish job only reads it as the release body, it does not delete it. The next release cycle overwrites it, so no manual cleanup is required.

You do not need to manually verify Docker or the distribution channels unless a job in the workflow run shows red. Check the workflow run summary; if all jobs are green, you are done.


Step 7: Versioned documentation deployment

ZeroClaw docs use a versioned structure on the gh-pages branch. When a tag is pushed, the Deploy mdBook docs to Pages workflow automatically builds and deploys the documentation for that version. This runs automatically after the tag from step 4 lands; the bootstrap and version-floor details below are reference material for when you need to recreate gh-pages or change the supported-version window.

What happens automatically

  • Pushing a tag (e.g., v0.8.0) triggers a build that lands in /v0.8.0/.
  • If the tag is a GA release (no pre-release suffix), it is also copied to /stable/.
  • The _shared/ directory (containing UI CSS, JS, and favicons) is updated from the build so the theme cascades to all deployed versions.

Bootstrapping gh-pages

If gh-pages is ever deleted or needs to be fully recreated, seed the versions in this specific order:

  1. Oldest supported release: workflow_dispatch with tag v0.7.5
  2. Next releases: workflow_dispatch with tag v0.8.0-beta-1 etc.
  3. Current master: workflow_dispatch with tag master

Important

master must be deployed last during bootstrapping. It writes the definitive _shared/ chrome layer that all other versions use.

Manual redeploys and the Version Floor

To manually re-deploy a specific version:

  1. Go to ActionsDeploy mdBook docs to Pages
  2. Click Run workflow
  3. Enter the tag (e.g., v0.7.5 or master)

The DOCS_MIN_VERSION floor: To prevent accidentally deploying very old or unsupported versions, the workflow enforces a minimum version floor (currently v0.7.5).

  • Tags older than DOCS_MIN_VERSION (like v0.7.4) are rejected by the workflow.
  • cargo mdbook gen-versions (the xtask helper) ignores any directories on gh-pages below this floor, keeping them out of the version dropdown.

If you need to raise the floor to drop support for an older version:

  1. Update the DOCS_MIN_VERSION environment variable in .github/workflows/docs-deploy.yml.
  2. (Optional) Delete the old version’s directory from the gh-pages branch to save space.

If something goes wrong

validate failed: version mismatch: The version bump PR was not merged, or you typed the wrong version. Fix the mismatch and re-trigger.

An environment gate timed out: Re-run only the timed-out job. No need to restart the workflow.

A distribution channel job failed (Scoop, AUR, Homebrew): Each has a corresponding manually-triggerable sub-workflow. Re-run the specific one with dry_run: true first to confirm the fix, then dry_run: false. These are nice-to-have: a failed Scoop job does not invalidate the release itself.


Removed legacy workflows

Several auto-publishing workflows that previously lived in .github/workflows/ have been deleted because they bypassed review or published irreversibly. They are no longer present; if any reappears in a PR, treat it as a regression and block it:

WorkflowWhy it was removed
release-beta-on-push.ymlPublished automatically on every push to master
publish-crates-auto.ymlAuto-published to crates.io on any version change, irreversible
version-sync.ymlCommitted directly to master as a bot, bypassing review
checks-on-pr.ymlDuplicate CI: produced confusing conflicting status
pre-release-validate.ymlUnused generated checklist; this runbook replaces it

The full inventory of the workflows that remain (automatic and manual) lives in CI & Actions.


Where this is going

This runbook and release-stable-manual.yml are a bridge, not a destination.

The target end state:

  • release-plz manages version bumps and changelogs automatically
  • A single release.yml replaces the current patchwork of sub-workflows
  • SLSA provenance is built into the pipeline
  • The team cuts releases by merging a release PR, not by following a runbook

Until that lands, use this process. Every release you cut manually using this runbook is practice that informs what the automation needs to do.