feat: push --dry-run preview mode#16
Merged
dhruva-reddy merged 1 commit intomainfrom May 2, 2026
Merged
Conversation
This was referenced May 1, 2026
Contributor
Author
This was referenced May 1, 2026
392855d to
d9d9477
Compare
898200a to
0f35c9e
Compare
adhamvapi
approved these changes
May 1, 2026
6430703 to
2fc1864
Compare
d9d9477 to
714523f
Compare
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
## ELI5
**Problem.** The engine could *create* simulation suites and track
them in state, and AGENTS.md described `simulations/suites/` as a
first-class resource type. But there was no `npm run` command to
actually *execute* a suite. `npm run eval` exists but runs the
*legacy* `/evals` endpoint — a different thing — and the naming
overlap actively misled engineers into running the wrong command. To
fire a simulation suite from the CLI you had to write raw curl or go
to the dashboard UI (losing reproducibility).
**What this fix does.** Adds `npm run sim`. Two shapes:
```
npm run sim -- <org> --suite <name> --target <assistant-or-squad>
npm run sim -- <org> --simulations <n1>,<n2> --target <assistant>
```
Resolves local resource names → state-file UUIDs the same way
`npm run call` does, POSTs `/eval/simulation/run`, polls the run
status, prints a summary table (pass/fail per simulation, mean run
time, structured-output evals).
**Outcome you'll notice.** Simulation suites become a normal part of
the gitops workflow: author the suite as YAML, push it via
`npm run push`, run it via `npm run sim`. No more dashboard
clicking. Note the AGENTS.md call-out clarifying the difference
between `npm run sim` (unified `/eval/simulation/*`) and
`npm run eval` (legacy `/evals`) — renaming `eval` to disambiguate
is a separate, backwards-incompatible follow-up.
---
Engine fully tracks simulation suites in state and AGENTS.md describes
simulations/suites/ as a first-class resource type, but there's no
npm run command to actually execute one. npm run eval runs the legacy
/evals endpoint, not the unified simulation runner. Customers go to
the dashboard UI to trigger runs (losing reproducibility) or write
per-customer shell wrappers.
- src/sim.ts (NEW): runSimulationSuite + runSimulationsByName helpers.
Resolves local-name → UUID via state file; POSTs /eval/simulation/run;
polls /eval/simulation/run/:id until completion; prints pass/fail
summary per simulation with mean run time + structured-output evals.
Reuses src/api.ts:vapiRequest for HTTP and the local-name → UUID
resolution pattern from src/eval.ts.
- src/sim-cmd.ts (NEW): CLI entry. Args:
npm run sim -- <org> --suite <name> --target <assistant-or-squad>
npm run sim -- <org> --simulations <n1>,<n2> --target <assistant>
npm run sim -- <org> --suite <name> --watch
- package.json: sim script.
- AGENTS.md: document npm run sim alongside npm run eval (call out the
legacy /evals vs unified /eval/simulation/* distinction).
- tests/sim.test.ts: arg parsing, UUID resolution, status polling,
summary table formatting.
Note: renaming npm run eval to disambiguate is a follow-up — that's a
backwards-incompatible script-name change. For now the AGENTS.md note
calls out the distinction.
Closes improvements.md #16.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
714523f to
bf5161c
Compare
2fc1864 to
c3c1c8a
Compare
## ELI5 **Problem.** `npm run push -- <env>` immediately starts hitting the live dashboard. There was no way to ask "what would this push do?" before firing it. So a fat-fingered command — wrong org, missing file path, wide-scope push when you meant scoped — hit production immediately, and recovery meant `pull` + manual revert. The only existing dry-run concept gated *deletions*, not creates or updates. **What this fix does.** Adds a `--dry-run` flag to `push`. Instead of firing POST/PATCH/DELETE, the engine counts the intent and prints `[dry-run] would <METHOD> <endpoint> <body-preview>` per resource. The state file is never written (so synthetic IDs don't pollute it), and the end-of-run summary shows `Would create N, would update M, would delete K`. GETs still run because drift detection (Stack G) and operator preview both need to see current platform state. **Outcome you'll notice.** Run `npm run push -- <env> --dry-run` to preview any push. Especially useful for "did I scope this right?" and "is the pre-push lint reporting drift I should address first?" before the real push. Cheapest individual operator-safety win in the stack — no schema changes, no engine architecture moves. --- Operators today can't validate "is this push doing what I think it's doing" before it lands on prod. push.ts has a dry-run concept only for deletions; updates and creates fire immediately. Cheapest individual operator-safety win (improvements.md #5). - src/config.ts: parseFlags now accepts --dry-run alongside --force / --bootstrap. Exports DRY_RUN. - src/api.ts: vapiRequest gates POST/PATCH on DRY_RUN — counts the intent, prints `[dry-run] would <METHOD> <endpoint>` with a 120-char body preview, and returns a synthetic id so caller code threads through. vapiDelete gets the same treatment. GETs always run (drift preview needs them). - src/push.ts: banner ("🧪 DRY-RUN") at start, summary at end ("Would create N, would update M, would delete K"), saveState entirely skipped in dry-run so synthetic ids never leak into the state file. - AGENTS.md: document --dry-run in Available Commands. - tests/push-dry-run.test.ts: --dry-run is parse-accepted, banner prints, state file is NEVER created (verified end-to-end via spawn). - improvements.md: #5 → RESOLVED. Closes improvements.md #5. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
bf5161c to
87fb394
Compare
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
## ELI5
**Problem.** The engine could *create* simulation suites and track
them in state, and AGENTS.md described `simulations/suites/` as a
first-class resource type. But there was no `npm run` command to
actually *execute* a suite. `npm run eval` exists but runs the
*legacy* `/evals` endpoint — a different thing — and the naming
overlap actively misled engineers into running the wrong command. To
fire a simulation suite from the CLI you had to write raw curl or go
to the dashboard UI (losing reproducibility).
**What this fix does.** Adds `npm run sim`. Two shapes:
```
npm run sim -- <org> --suite <name> --target <assistant-or-squad>
npm run sim -- <org> --simulations <n1>,<n2> --target <assistant>
```
Resolves local resource names → state-file UUIDs the same way
`npm run call` does, POSTs `/eval/simulation/run`, polls the run
status, prints a summary table (pass/fail per simulation, mean run
time, structured-output evals).
**Outcome you'll notice.** Simulation suites become a normal part of
the gitops workflow: author the suite as YAML, push it via
`npm run push`, run it via `npm run sim`. No more dashboard
clicking. Note the AGENTS.md call-out clarifying the difference
between `npm run sim` (unified `/eval/simulation/*`) and
`npm run eval` (legacy `/evals`) — renaming `eval` to disambiguate
is a separate, backwards-incompatible follow-up.
---
Engine fully tracks simulation suites in state and AGENTS.md describes
simulations/suites/ as a first-class resource type, but there's no
npm run command to actually execute one. npm run eval runs the legacy
/evals endpoint, not the unified simulation runner. Customers go to
the dashboard UI to trigger runs (losing reproducibility) or write
per-customer shell wrappers.
- src/sim.ts (NEW): runSimulationSuite + runSimulationsByName helpers.
Resolves local-name → UUID via state file; POSTs /eval/simulation/run;
polls /eval/simulation/run/:id until completion; prints pass/fail
summary per simulation with mean run time + structured-output evals.
Reuses src/api.ts:vapiRequest for HTTP and the local-name → UUID
resolution pattern from src/eval.ts.
- src/sim-cmd.ts (NEW): CLI entry. Args:
npm run sim -- <org> --suite <name> --target <assistant-or-squad>
npm run sim -- <org> --simulations <n1>,<n2> --target <assistant>
npm run sim -- <org> --suite <name> --watch
- package.json: sim script.
- AGENTS.md: document npm run sim alongside npm run eval (call out the
legacy /evals vs unified /eval/simulation/* distinction).
- tests/sim.test.ts: arg parsing, UUID resolution, status polling,
summary table formatting.
Note: renaming npm run eval to disambiguate is a follow-up — that's a
backwards-incompatible script-name change. For now the AGENTS.md note
calls out the distinction.
Closes improvements.md #16.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Contributor
Author
Merge activity
|
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
## ELI5
**Problem.** The engine could *create* simulation suites and track
them in state, and AGENTS.md described `simulations/suites/` as a
first-class resource type. But there was no `npm run` command to
actually *execute* a suite. `npm run eval` exists but runs the
*legacy* `/evals` endpoint — a different thing — and the naming
overlap actively misled engineers into running the wrong command. To
fire a simulation suite from the CLI you had to write raw curl or go
to the dashboard UI (losing reproducibility).
**What this fix does.** Adds `npm run sim`. Two shapes:
```
npm run sim -- <org> --suite <name> --target <assistant-or-squad>
npm run sim -- <org> --simulations <n1>,<n2> --target <assistant>
```
Resolves local resource names → state-file UUIDs the same way
`npm run call` does, POSTs `/eval/simulation/run`, polls the run
status, prints a summary table (pass/fail per simulation, mean run
time, structured-output evals).
**Outcome you'll notice.** Simulation suites become a normal part of
the gitops workflow: author the suite as YAML, push it via
`npm run push`, run it via `npm run sim`. No more dashboard
clicking. Note the AGENTS.md call-out clarifying the difference
between `npm run sim` (unified `/eval/simulation/*`) and
`npm run eval` (legacy `/evals`) — renaming `eval` to disambiguate
is a separate, backwards-incompatible follow-up.
---
Engine fully tracks simulation suites in state and AGENTS.md describes
simulations/suites/ as a first-class resource type, but there's no
npm run command to actually execute one. npm run eval runs the legacy
/evals endpoint, not the unified simulation runner. Customers go to
the dashboard UI to trigger runs (losing reproducibility) or write
per-customer shell wrappers.
- src/sim.ts (NEW): runSimulationSuite + runSimulationsByName helpers.
Resolves local-name → UUID via state file; POSTs /eval/simulation/run;
polls /eval/simulation/run/:id until completion; prints pass/fail
summary per simulation with mean run time + structured-output evals.
Reuses src/api.ts:vapiRequest for HTTP and the local-name → UUID
resolution pattern from src/eval.ts.
- src/sim-cmd.ts (NEW): CLI entry. Args:
npm run sim -- <org> --suite <name> --target <assistant-or-squad>
npm run sim -- <org> --simulations <n1>,<n2> --target <assistant>
npm run sim -- <org> --suite <name> --watch
- package.json: sim script.
- AGENTS.md: document npm run sim alongside npm run eval (call out the
legacy /evals vs unified /eval/simulation/* distinction).
- tests/sim.test.ts: arg parsing, UUID resolution, status polling,
summary table formatting.
Note: renaming npm run eval to disambiguate is a follow-up — that's a
backwards-incompatible script-name change. For now the AGENTS.md note
calls out the distinction.
Closes improvements.md #16.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

ELI5
Problem.
npm run push -- <env>immediately starts hitting the livedashboard. There was no way to ask "what would this push do?" before
firing it. So a fat-fingered command — wrong org, missing file path,
wide-scope push when you meant scoped — hit production immediately,
and recovery meant
pull+ manual revert. The only existing dry-runconcept gated deletions, not creates or updates.
What this fix does. Adds a
--dry-runflag topush. Instead offiring POST/PATCH/DELETE, the engine counts the intent and prints
[dry-run] would <METHOD> <endpoint> <body-preview>per resource.The state file is never written (so synthetic IDs don't pollute it),
and the end-of-run summary shows
Would create N, would update M, would delete K. GETs still run because drift detection (Stack G) andoperator preview both need to see current platform state.
Outcome you'll notice. Run
npm run push -- <env> --dry-runtopreview any push. Especially useful for "did I scope this right?" and
"is the pre-push lint reporting drift I should address first?" before
the real push. Cheapest individual operator-safety win in the stack —
no schema changes, no engine architecture moves.
Operators today can't validate "is this push doing what I think it's
doing" before it lands on prod. push.ts has a dry-run concept only for
deletions; updates and creates fire immediately. Cheapest individual
operator-safety win (improvements.md #5).
--bootstrap. Exports DRY_RUN.
intent, prints
[dry-run] would <METHOD> <endpoint>with a 120-charbody preview, and returns a synthetic id so caller code threads
through. vapiDelete gets the same treatment. GETs always run (drift
preview needs them).
create N, would update M, would delete K"), saveState entirely skipped
in dry-run so synthetic ids never leak into the state file.
state file is NEVER created (verified end-to-end via spawn).
Closes improvements.md #5.
🤖 Generated with Claude Code