Skip to content

Add mass environment preview for per-PR preview environments#236

Merged
coryodaniel merged 7 commits into
mainfrom
preview-cli
May 16, 2026
Merged

Add mass environment preview for per-PR preview environments#236
coryodaniel merged 7 commits into
mainfrom
preview-cli

Conversation

@coryodaniel
Copy link
Copy Markdown
Member

Summary

Adds mass environment preview <ID> -f preview.yaml — a single command that converges a preview environment for a PR / branch / feature ramp. Re-running the command resets the env back to whatever the YAML declares.

Under the hood it composes four idempotent V2 mutations:

  1. forkEnvironment from the base env in the config (copyEnvironmentDefaults / copySecrets / copyRemoteReferences macros pass through).
  2. setEnvironmentDefault per environmentDefaults entry.
  3. Per-instance overrides — copyInstance for params, updateInstance for version/release strategy, setInstanceSecret per secret, setRemoteReference per ref.
  4. deployEnvironment to fan a provision wave across all instances.

The companion V2 API changes (copySecrets/copyRemoteReferences on fork, deployEnvironment mutation) are on the platform side in massdriver-cloud/massdriver#3246.

Config schema

Follows V2 terminology — no "slug", no "artifact", no massdriver/ scoping prefix.

project: demo
baseEnvironment: production
copyEnvironmentDefaults: true

environmentDefaults:
  - resourceType: aws-iam-role
    resourceId: 161aeb95-...

instances:
  chatdb:
    version: "~2.0"
    releaseStrategy: stable
    params:
      ingress: { enabled: true }
    secrets:
      - { name: STRIPE_KEY, value: FOO }
    remoteReferences:
      - { resourceId: a1b2c3..., field: kubernetes_cluster }
  sessions:    # listed without overrides — inherits from the fork

Full schema doc + example: docs/helpdocs/environment/preview.md and preview.example.yaml.

Test plan

  • go build ./...
  • go test ./... — preview unit tests cover load, validation, happy-path call sequence, and fork-failure propagation
  • Smoke-test against a real org once the platform-side PR (massdriver-cloud/massdriver#3246) is merged

🤖 Generated with Claude Code

@coryodaniel coryodaniel requested a review from chrisghill as a code owner May 14, 2026 22:35
@chrisghill
Copy link
Copy Markdown
Member

There is a HUUUUUGE PR about to merge that puts all of this on the SDK. Can this wait?
#237

Comment thread cmd/environment.go
}

fmt.Printf("✅ Preview environment `%s` converged\n", env.ID)
fmt.Printf("🔗 %s\n", mdClient.URLs.Helper(ctx).EnvironmentURL(env.ID))
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include an option to watch for deployments and tail prefixing each line with the instance.id for grepability since some may run in parallel

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0c78eb8 — added --follow to both environment preview (here) and environment deploy (in #238, since that's where the deploy command lives). Output looks like:

[ecomm-pr42-db]  applying db schema
[ecomm-pr42-app] starting container
[ecomm-pr42-db]  migrations done

Each tail goroutine line-buffers and serializes through a mutex'd sink, so parallel deployments don't tear lines.

One implementation note worth flagging: I went with polling rather than subscribing to environmentEvents / instanceEvents as you suggested. The SDK's Absinthe/Phoenix-channels machinery is in internal/absinthe, so doing real subscriptions from the CLI would mean re-porting that layer just for this flag. Polling per-instance Deployments.List every ~2s has a few seconds of discovery latency per new deployment, which seemed acceptable next to per-instance deploy times measured in minutes. Happy to switch to the subscription path the moment the SDK exposes event subscriptions publicly — at that point it's a one-file swap inside internal/commands/environment/follow.go.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed the SDK ask: massdriver-cloud/massdriver-sdk-go#21 — request to expose environmentEvents / instanceEvents as typed Subscribe methods on the existing services. Once that ships, the swap inside follow.go is one file.

@coryodaniel
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @coryodaniel's task in 2m 38s —— View job


Code Review

Note on chrisghill's comment: PR #237 ("swap to using SDK for GQL interactions") has already merged (commit 2148f7b) and this branch is rebased on top of it — the SDK concern is already resolved.


Summary

Solid, well-structured implementation. The interface-based PreviewAPI, env-var expansion via os.ExpandEnv, and the --attributes CLI override pattern are all good design choices. The test coverage for happy-path and error propagation is thorough. A few concrete issues below.


Issues

1. releaseStrategy promised but not implemented

The PR description and the example YAML show releaseStrategy: stable, and the PR summary says step 3 handles "version/release strategy" via updateInstance. But InstanceOverride only has Version — there is no ReleaseStrategy field, and instances.UpdateInput is called with only {Version: override.Version}.

Either the field needs to be added or the PR description/docs need to be corrected. Given instances.UpdateInput almost certainly supports it (the existing mass instance update command uses the same input), this is likely an unintentional omission. Fix this →

2. ID format not validated

The docs and help text both state the ID must match ^[a-z0-9]{1,20}$, but the code only checks for empty string:

// preview.go:157
if opts.ID == "" {
    return nil, errors.New("preview environment ID is required")
}

A user passing pr-123 (with dashes) or UPPERCASE will get a confusing platform-level error instead of a clear CLI message. Fix this →

3. Fork idempotency claim needs verification

RunPreview calls api.Fork(...) unconditionally on every run. The PR claims "re-running the command is safe — every step is idempotent", but this depends entirely on the platform returning a successful response (or the existing env) when a fork with the same ID already exists. If Fork returns an error for a duplicate ID, the second run fails immediately before applying any overrides.

This should either be documented clearly ("the platform forkEnvironment mutation is upsert-style") or handled in the CLI with a fallback. The smoke-test requirement makes this the most critical unknown before merging.

4. Secret values in plain text YAML

preview.example.yaml has value: FOO and the secrets section doesn't highlight the env-var pattern for actual secrets. Users will naturally check in a preview.yaml with hardcoded secrets. The example and docs should strongly recommend value: "${STRIPE_KEY}" for any real secret value — the env-var expansion already supports it, just not emphasized. Fix this →


Minor issues

Non-deterministic instance processing order (preview.go:195): for localID, override := range config.Instances iterates over a map, so call order varies between runs. This is fine semantically (operations are independent), but it makes debugging harder and will cause occasional test flakiness if anyone ever asserts call order. Consider a sorted iteration or note this as intentional.

applyInstanceOverride over-receives (preview.go:214): The function takes the full *PreviewConfig but only touches config.Project and config.BaseEnvironment. Accepting just those two strings would reduce coupling and make the function easier to test in isolation.

SilenceUsage placement (cmd/environment.go:406): It's set after config load but before client creation. Other run functions in this file set it immediately. Move it to be the first statement in the function body for consistency.


Testing observations

  • No test for invalid ID format (would catch issue Adding examples #2 above)
  • No test for validatePreviewConfig rejecting a missing secret name
  • TestRunPreview_HappyPath doesn't assert which instance triggered the copyInstance call — with map iteration being non-deterministic, the test reliably passes today but could hide ordering bugs. Asserting api.copyInputs[0].source == "demo-production-chatdb" would be more precise.

On coryodaniel's suggestion at line 433

The --watch / deployment tail idea is a good UX addition. Suggest tracking it as a follow-up issue rather than blocking this PR — the deploy step today is "fire and forget", and adding watch would require polling or streaming support from the platform side anyway.


Checklist:

  • Gather context and understand the request
  • Read changed files
  • Review internal/commands/environment/preview.go
  • Review internal/commands/environment/preview_test.go
  • Review cmd/environment.go changes
  • Review docs and example YAML
  • Post comprehensive review feedback

coryodaniel and others added 4 commits May 15, 2026 16:14
Converges a preview environment from a YAML config in four idempotent
steps: forkEnvironment, setEnvironmentDefault, copyInstance per
instance (params/version/secrets), and deployEnvironment.

Config terminology follows V2:
- `project` (no "slug")
- `baseEnvironment`
- `environmentDefaults` use `resourceType` + `resourceId`
- `instances` (V2 term)

Supports:
- ABAC `attributes` (config and `-a` flag; flag overrides config).
- `${VAR}` / `$VAR` expansion across the YAML for CI metadata.
- Fork-level macros (copyEnvironmentDefaults / copySecrets /
  copyRemoteReferences).
- Per-instance overrides for version, params (via copyInstance with
  overrides for deep-merge), and secrets.

Wired through the SDK's environments / instances services. Per-instance
`remoteReferences:` overrides aren't supported yet — the SDK's
instances service doesn't expose setRemoteReference; the fork-level
copyRemoteReferences macro handles the bulk case until that lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Revive's var-naming rule rejects `environment.EnvironmentDefaultEntry`
for stuttering with the package name. `environment.DefaultEntry` reads
cleaner at the call site too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tails every deployment that fires during a preview rollout, prefixing
each line with the instance ID so the interleaved output from parallel
deployments stays grep-friendly:

    [ecomm-pr42-db]  applying db schema
    [ecomm-pr42-app] starting container
    [ecomm-pr42-db]  migrations done

`FollowEnvironment` polls per-instance deployment lists every 2s,
spawns a Deployments.TailLogs goroutine for each new deployment it
sees, and exits after a quiet window with all observed deployments
terminal. The platform deploys in dependency-ordered waves, so the
quiet window has to outlast the gap between waves — set to 30s.

Polling instead of subscribing: the SDK's Absinthe/Phoenix-channels
machinery is in an internal package, so writing real
`environmentEvents` / `instanceEvents` subscriptions from the CLI
would mean re-porting that layer. Polling has a few seconds of
discovery latency per deployment, acceptable next to per-instance
deploy times measured in minutes. The subscription path is cleaner
and should land once the SDK exposes event subscriptions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The SDK now exposes `Environments.StreamEvents` and
`Instances.StreamEvents` (massdriver-cloud/massdriver-sdk-go#22), so
FollowEnvironment no longer needs to spin per-instance Deployments.List
polling loops to discover new deployments.

The new flow:

- Subscribe to `environmentEvents(envID)` over WebSocket.
- On each `DeploymentEvent` for a deployment we haven't seen, fetch
  the deployment once (the event payload trims `instance.id` for
  bandwidth) and kick off a `Deployments.TailLogs` goroutine into a
  prefix writer keyed on the instance ID.
- Track which deployments are still active by transitioning them off
  the active set on terminal status events.
- Exit when no active deployments remain and the quiet window has
  elapsed.

Net effect from a user's POV: deployment discovery is now sub-second
instead of polling-interval bound, and the CLI makes one round-trip
per new deployment instead of one per (instance × poll interval).

Pinned to the events-streaming branch HEAD until v0.2.2 is tagged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- unconvert: drop string() cast around Deployment.Status (it's already string)
- reassign: tag the FollowQuietWindow overrides with the same nolint pattern deploy_test.go uses for DeploymentStatusSleep
- goimports: reformat the errorAPI stub block
Multi-instance tails were jittery because each line's prefix was sized
to its own instance id — `[fancy-claude-pg]` and `[fancy-claude-mysql]`
gave you two different column offsets for the actual log text.

List instances up front, take the longest id as the column width, and
right-pad every prefix to that width. New env-deploys create every
instance during fork before any deployment fires, so the up-front list
is the full set.

    [fancy-claude-pg]    applying schema
    [fancy-claude-mysql] starting mysqld
    [fancy-claude-pg]    migrations done
errorAPI is only invoked through StreamEnvironmentEvents which already
errors, so the other methods are dead code in practice — but `nilnil`
flags any `return nil, nil` from a non-error path. Return empty
`*types.Deployment{}` and empty slice instead.
@coryodaniel coryodaniel merged commit 7d69950 into main May 16, 2026
5 checks passed
@coryodaniel coryodaniel deleted the preview-cli branch May 16, 2026 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants