Skip to content

feat: drift detection on push (--overwrite to bypass)#20

Open
dhruva-reddy wants to merge 1 commit intodhruva-reddy/refactor/state-schema-content-hashesfrom
dhruva-reddy/feat/push-drift-detection
Open

feat: drift detection on push (--overwrite to bypass)#20
dhruva-reddy wants to merge 1 commit intodhruva-reddy/refactor/state-schema-content-hashesfrom
dhruva-reddy/feat/push-drift-detection

Conversation

@dhruva-reddy
Copy link
Copy Markdown
Contributor

ELI5

Problem. Today: you pull, your teammate edits the same assistant
on the dashboard during a live test, you push your unrelated branch,
and their dashboard edit disappears with no warning. Customer-success
reps update business hours via the dashboard; the next gitops push
silently reverts them. Even git revert + push rollbacks have the
same problem — they overwrite whatever's currently live, not just the
change being reverted. The engine had no way to detect this because
the state file only stored name→UUID, no record of the platform's
content at last pull.

What this fix does. Now that Stack F populates lastPulledHash,
drift detection becomes possible. Before each PATCH, the engine GETs
the current platform payload, hashes it, and compares to the
lastPulledHash in state.

  • Hashes match → continue silently.
  • Hashes differ + no flag → refuse the push, point at the
    drift, ask the operator to either pull-and-resolve or pass
    --overwrite to take ownership.
  • Hashes differ + --overwrite → log "overwriting drift" and
    proceed.
  • No baseline (legacy state, first push after Stack F) → log
    "drift unknown — proceeding" and don't block.

Also adds a specific helper for the Cartesia voice picker
footgun: if pronunciationDictId was set at last pull but isn't on
the platform now, surface that explicitly so the operator notices.

Outcome you'll notice. Concurrent dashboard edits no longer
disappear silently. If someone else touched a resource between your
pull and your push, you see the conflict at push time and have to
make an explicit call (overwrite, or pull and resolve). The engine
becomes a real safety rail rather than a blind PATCH machine.


Before each PATCH, GET the current platform payload, hash it, and
compare to the lastPulledHash recorded in state (Stack F). If the
hashes differ, the dashboard has drifted away from the version we last
pulled — refuse to push without --overwrite.

Behavior matrix:

  • No lastPulledHash (legacy state, first push after Stack F): log
    "drift unknown — proceeding" and continue. Don't block.
  • Hashes match: continue silently.
  • Hashes differ + no --overwrite: refuse the push, return null.
  • Hashes differ + --overwrite: log "overwriting drift" and continue.

Files:

  • src/drift.ts (NEW): checkDriftForUpdate(endpoint, state, overwrite).
    GETs platform, strips server-managed fields (id/orgId/createdAt/etc)
    to align hash basis with cleanResource()'s output, sha256 compares.
    Returns DriftCheckResult with reason and message for caller logging.
  • src/state-serialize.ts: checkPronunciationDictDrop helper for the
    Cartesia voice-picker case (improvements.md feat: interactive CLI, slug-based orgs, evals support #7) — pure data, safe
    to import in tests.
  • src/config.ts: --overwrite flag.
  • src/push.ts: drift gate in upsertResourceWithStateRecovery before
    every PATCH. Skipped in dry-run (operator wants to see what would
    happen). Skipped if no baseline.
  • tests/drift.test.ts: hash-match → ok, hash-differ-no-overwrite → ok=false,
    hash-differ-overwrite → ok=true, no-baseline → ok=true.

Closes improvements.md #1, #7. Partial #2 (push side caught; pull side
same-file conflict still requires manual resolution).

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor Author

dhruva-reddy commented May 1, 2026

@dhruva-reddy dhruva-reddy force-pushed the dhruva-reddy/feat/push-drift-detection branch from 2b4bc77 to 05675a0 Compare May 1, 2026 22:56
@dhruva-reddy dhruva-reddy force-pushed the dhruva-reddy/refactor/state-schema-content-hashes branch from c9cf252 to 10c101c Compare May 1, 2026 22:56
dhruva-reddy added a commit that referenced this pull request May 2, 2026
**Problem.** The Vapi API rejects bad configs at PATCH time with terse
400s ("property speed should not exist") — and by then the push has
already partially completed against other resources. We watched the
same five classes of mistake hit production over and over:

  1. Assistant names (or eval names) longer than 40 chars (silent cap).
  2. Structured-output ↔ assistant lockstep mismatch — one side declares
     the relationship, the other doesn't, dashboard ends up inconsistent.
  3. Prompts duplicated by paste-on-top dashboard edits (10kB prompt
     with two identical headers stacked, agent follows both).
  4. `maxTokens` set lower than the JSON-schema size of the attached
     tools' arguments — assistant looks fine on push, bricks on first
     tool-using call.
  5. Voice fields nested wrong for the provider (`voice.speed` on
     Cartesia, where it lives at `voice.generationConfig.speed`).

**What this fix does.** Five client-side validators, all running off
the same `LoadedResources` shape that `push.ts` would actually ship —
so the lint runs against exactly what would be pushed, no separate
parser to drift. Surfaces as warnings by default (one bad spec doesn't
block an otherwise-good push); promote to abort with `--strict`. Run
standalone via `npm run validate -- <org>`.

**Outcome you'll notice.** Most schema-class mistakes get caught
locally in seconds instead of mid-push 400s. Voice provider field
mismatch gets a specific message pointing at the right path. CI can
add `npm run push -- <env> --strict` as a gate before any deploy.

---

Catch the classes of errors that today only surface when the API returns
a 400 mid-push. The push pipeline runs validation in warn-only mode by
default; --strict promotes errors to a blocking abort before any API
call. Standalone runner via `npm run validate -- <org>`.

Validators implemented:

1. Name length cap (40 chars). Walks every assistant.name and every
   evaluations[].structuredOutput.name in scenarios. Closes #18.
2. SO ↔ assistant bidirectional lockstep. For every SO file's
   assistant_ids, checks the named assistant's structuredOutputIds
   mirrors it; reverse direction too. Closes #11.
3. Prompt duplication heuristics. Same H1 heading appearing twice,
   repeated CONTINUITY ON ENTRY / CLOSEOUT FLOW STRUCTURE blocks.
   Partial fix for #8 (paste-on-top dashboard duplications).
4. maxTokens floor for tool-using assistants. Computes
   floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters)))
   per attached tool. Warns under floor. Closes #19.
5. Per-provider voice schema. Cartesia rejects top-level speed /
   stability / similarityBoost / enableSsmlParsing (point at
   generationConfig.* / drop the field). 11labs rejects
   generationConfig (it's a Cartesia path). Closes #9 (engine half).

- src/validate.ts (NEW): validateResources(loadedResources) returning
  ValidationFinding[] with severity / type / resourceId / rule / message
  / fieldPath. Pure data; safe to test directly.
- src/validate-cmd.ts (NEW): CLI entry. Loads same resource shape as
  push.ts so the lint runs against exactly what would ship. Exit non-zero
  on any error finding.
- src/config.ts: --strict flag.
- src/push.ts: validators run in default-warn mode; --strict aborts.
- package.json: validate script.
- AGENTS.md: document npm run validate and --strict.
- tests/validate.test.ts: per-rule fixtures (golden + bad inputs)
  covering all five checks.

Closes improvements.md #11, #18, #19. Resolves engine half of #9.
Partial #8, #20 (heuristic only).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@dhruva-reddy dhruva-reddy force-pushed the dhruva-reddy/refactor/state-schema-content-hashes branch from 10c101c to 1efe5ec Compare May 2, 2026 01:22
@dhruva-reddy dhruva-reddy force-pushed the dhruva-reddy/feat/push-drift-detection branch from 05675a0 to c1d9632 Compare May 2, 2026 01:23
dhruva-reddy added a commit that referenced this pull request May 2, 2026
**Problem.** The Vapi API rejects bad configs at PATCH time with terse
400s ("property speed should not exist") — and by then the push has
already partially completed against other resources. We watched the
same five classes of mistake hit production over and over:

  1. Assistant names (or eval names) longer than 40 chars (silent cap).
  2. Structured-output ↔ assistant lockstep mismatch — one side declares
     the relationship, the other doesn't, dashboard ends up inconsistent.
  3. Prompts duplicated by paste-on-top dashboard edits (10kB prompt
     with two identical headers stacked, agent follows both).
  4. `maxTokens` set lower than the JSON-schema size of the attached
     tools' arguments — assistant looks fine on push, bricks on first
     tool-using call.
  5. Voice fields nested wrong for the provider (`voice.speed` on
     Cartesia, where it lives at `voice.generationConfig.speed`).

**What this fix does.** Five client-side validators, all running off
the same `LoadedResources` shape that `push.ts` would actually ship —
so the lint runs against exactly what would be pushed, no separate
parser to drift. Surfaces as warnings by default (one bad spec doesn't
block an otherwise-good push); promote to abort with `--strict`. Run
standalone via `npm run validate -- <org>`.

**Outcome you'll notice.** Most schema-class mistakes get caught
locally in seconds instead of mid-push 400s. Voice provider field
mismatch gets a specific message pointing at the right path. CI can
add `npm run push -- <env> --strict` as a gate before any deploy.

---

Catch the classes of errors that today only surface when the API returns
a 400 mid-push. The push pipeline runs validation in warn-only mode by
default; --strict promotes errors to a blocking abort before any API
call. Standalone runner via `npm run validate -- <org>`.

Validators implemented:

1. Name length cap (40 chars). Walks every assistant.name and every
   evaluations[].structuredOutput.name in scenarios. Closes #18.
2. SO ↔ assistant bidirectional lockstep. For every SO file's
   assistant_ids, checks the named assistant's structuredOutputIds
   mirrors it; reverse direction too. Closes #11.
3. Prompt duplication heuristics. Same H1 heading appearing twice,
   repeated CONTINUITY ON ENTRY / CLOSEOUT FLOW STRUCTURE blocks.
   Partial fix for #8 (paste-on-top dashboard duplications).
4. maxTokens floor for tool-using assistants. Computes
   floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters)))
   per attached tool. Warns under floor. Closes #19.
5. Per-provider voice schema. Cartesia rejects top-level speed /
   stability / similarityBoost / enableSsmlParsing (point at
   generationConfig.* / drop the field). 11labs rejects
   generationConfig (it's a Cartesia path). Closes #9 (engine half).

- src/validate.ts (NEW): validateResources(loadedResources) returning
  ValidationFinding[] with severity / type / resourceId / rule / message
  / fieldPath. Pure data; safe to test directly.
- src/validate-cmd.ts (NEW): CLI entry. Loads same resource shape as
  push.ts so the lint runs against exactly what would ship. Exit non-zero
  on any error finding.
- src/config.ts: --strict flag.
- src/push.ts: validators run in default-warn mode; --strict aborts.
- package.json: validate script.
- AGENTS.md: document npm run validate and --strict.
- tests/validate.test.ts: per-rule fixtures (golden + bad inputs)
  covering all five checks.

Closes improvements.md #11, #18, #19. Resolves engine half of #9.
Partial #8, #20 (heuristic only).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@dhruva-reddy dhruva-reddy force-pushed the dhruva-reddy/refactor/state-schema-content-hashes branch from 1efe5ec to 92c0312 Compare May 2, 2026 01:28
@dhruva-reddy dhruva-reddy force-pushed the dhruva-reddy/feat/push-drift-detection branch from c1d9632 to 09af67e Compare May 2, 2026 01:28
dhruva-reddy added a commit that referenced this pull request May 2, 2026
**Problem.** The Vapi API rejects bad configs at PATCH time with terse
400s ("property speed should not exist") — and by then the push has
already partially completed against other resources. We watched the
same five classes of mistake hit production over and over:

  1. Assistant names (or eval names) longer than 40 chars (silent cap).
  2. Structured-output ↔ assistant lockstep mismatch — one side declares
     the relationship, the other doesn't, dashboard ends up inconsistent.
  3. Prompts duplicated by paste-on-top dashboard edits (10kB prompt
     with two identical headers stacked, agent follows both).
  4. `maxTokens` set lower than the JSON-schema size of the attached
     tools' arguments — assistant looks fine on push, bricks on first
     tool-using call.
  5. Voice fields nested wrong for the provider (`voice.speed` on
     Cartesia, where it lives at `voice.generationConfig.speed`).

**What this fix does.** Five client-side validators, all running off
the same `LoadedResources` shape that `push.ts` would actually ship —
so the lint runs against exactly what would be pushed, no separate
parser to drift. Surfaces as warnings by default (one bad spec doesn't
block an otherwise-good push); promote to abort with `--strict`. Run
standalone via `npm run validate -- <org>`.

**Outcome you'll notice.** Most schema-class mistakes get caught
locally in seconds instead of mid-push 400s. Voice provider field
mismatch gets a specific message pointing at the right path. CI can
add `npm run push -- <env> --strict` as a gate before any deploy.

---

Catch the classes of errors that today only surface when the API returns
a 400 mid-push. The push pipeline runs validation in warn-only mode by
default; --strict promotes errors to a blocking abort before any API
call. Standalone runner via `npm run validate -- <org>`.

Validators implemented:

1. Name length cap (40 chars). Walks every assistant.name and every
   evaluations[].structuredOutput.name in scenarios. Closes #18.
2. SO ↔ assistant bidirectional lockstep. For every SO file's
   assistant_ids, checks the named assistant's structuredOutputIds
   mirrors it; reverse direction too. Closes #11.
3. Prompt duplication heuristics. Same H1 heading appearing twice,
   repeated CONTINUITY ON ENTRY / CLOSEOUT FLOW STRUCTURE blocks.
   Partial fix for #8 (paste-on-top dashboard duplications).
4. maxTokens floor for tool-using assistants. Computes
   floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters)))
   per attached tool. Warns under floor. Closes #19.
5. Per-provider voice schema. Cartesia rejects top-level speed /
   stability / similarityBoost / enableSsmlParsing (point at
   generationConfig.* / drop the field). 11labs rejects
   generationConfig (it's a Cartesia path). Closes #9 (engine half).

- src/validate.ts (NEW): validateResources(loadedResources) returning
  ValidationFinding[] with severity / type / resourceId / rule / message
  / fieldPath. Pure data; safe to test directly.
- src/validate-cmd.ts (NEW): CLI entry. Loads same resource shape as
  push.ts so the lint runs against exactly what would ship. Exit non-zero
  on any error finding.
- src/config.ts: --strict flag.
- src/push.ts: validators run in default-warn mode; --strict aborts.
- package.json: validate script.
- AGENTS.md: document npm run validate and --strict.
- tests/validate.test.ts: per-rule fixtures (golden + bad inputs)
  covering all five checks.

Closes improvements.md #11, #18, #19. Resolves engine half of #9.
Partial #8, #20 (heuristic only).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
dhruva-reddy added a commit that referenced this pull request May 2, 2026
**Problem.** The Vapi API rejects bad configs at PATCH time with terse
400s ("property speed should not exist") — and by then the push has
already partially completed against other resources. We watched the
same five classes of mistake hit production over and over:

  1. Assistant names (or eval names) longer than 40 chars (silent cap).
  2. Structured-output ↔ assistant lockstep mismatch — one side declares
     the relationship, the other doesn't, dashboard ends up inconsistent.
  3. Prompts duplicated by paste-on-top dashboard edits (10kB prompt
     with two identical headers stacked, agent follows both).
  4. `maxTokens` set lower than the JSON-schema size of the attached
     tools' arguments — assistant looks fine on push, bricks on first
     tool-using call.
  5. Voice fields nested wrong for the provider (`voice.speed` on
     Cartesia, where it lives at `voice.generationConfig.speed`).

**What this fix does.** Five client-side validators, all running off
the same `LoadedResources` shape that `push.ts` would actually ship —
so the lint runs against exactly what would be pushed, no separate
parser to drift. Surfaces as warnings by default (one bad spec doesn't
block an otherwise-good push); promote to abort with `--strict`. Run
standalone via `npm run validate -- <org>`.

**Outcome you'll notice.** Most schema-class mistakes get caught
locally in seconds instead of mid-push 400s. Voice provider field
mismatch gets a specific message pointing at the right path. CI can
add `npm run push -- <env> --strict` as a gate before any deploy.

---

Catch the classes of errors that today only surface when the API returns
a 400 mid-push. The push pipeline runs validation in warn-only mode by
default; --strict promotes errors to a blocking abort before any API
call. Standalone runner via `npm run validate -- <org>`.

Validators implemented:

1. Name length cap (40 chars). Walks every assistant.name and every
   evaluations[].structuredOutput.name in scenarios. Closes #18.
2. SO ↔ assistant bidirectional lockstep. For every SO file's
   assistant_ids, checks the named assistant's structuredOutputIds
   mirrors it; reverse direction too. Closes #11.
3. Prompt duplication heuristics. Same H1 heading appearing twice,
   repeated CONTINUITY ON ENTRY / CLOSEOUT FLOW STRUCTURE blocks.
   Partial fix for #8 (paste-on-top dashboard duplications).
4. maxTokens floor for tool-using assistants. Computes
   floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters)))
   per attached tool. Warns under floor. Closes #19.
5. Per-provider voice schema. Cartesia rejects top-level speed /
   stability / similarityBoost / enableSsmlParsing (point at
   generationConfig.* / drop the field). 11labs rejects
   generationConfig (it's a Cartesia path). Closes #9 (engine half).

- src/validate.ts (NEW): validateResources(loadedResources) returning
  ValidationFinding[] with severity / type / resourceId / rule / message
  / fieldPath. Pure data; safe to test directly.
- src/validate-cmd.ts (NEW): CLI entry. Loads same resource shape as
  push.ts so the lint runs against exactly what would ship. Exit non-zero
  on any error finding.
- src/config.ts: --strict flag.
- src/push.ts: validators run in default-warn mode; --strict aborts.
- package.json: validate script.
- AGENTS.md: document npm run validate and --strict.
- tests/validate.test.ts: per-rule fixtures (golden + bad inputs)
  covering all five checks.

Closes improvements.md #11, #18, #19. Resolves engine half of #9.
Partial #8, #20 (heuristic only).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@dhruva-reddy dhruva-reddy force-pushed the dhruva-reddy/refactor/state-schema-content-hashes branch from 92c0312 to d4ac9d6 Compare May 2, 2026 01:32
**Problem.** Today: you pull, your teammate edits the same assistant
on the dashboard during a live test, you push your unrelated branch,
and their dashboard edit disappears with no warning. Customer-success
reps update business hours via the dashboard; the next gitops push
silently reverts them. Even `git revert + push` rollbacks have the
same problem — they overwrite whatever's currently live, not just the
change being reverted. The engine had no way to detect this because
the state file only stored name→UUID, no record of the platform's
content at last pull.

**What this fix does.** Now that Stack F populates `lastPulledHash`,
drift detection becomes possible. Before each PATCH, the engine GETs
the current platform payload, hashes it, and compares to the
`lastPulledHash` in state.

  - Hashes match → continue silently.
  - Hashes differ + no flag → **refuse the push**, point at the
    drift, ask the operator to either pull-and-resolve or pass
    `--overwrite` to take ownership.
  - Hashes differ + `--overwrite` → log "overwriting drift" and
    proceed.
  - No baseline (legacy state, first push after Stack F) → log
    "drift unknown — proceeding" and don't block.

Also adds a specific helper for the **Cartesia voice picker**
footgun: if `pronunciationDictId` was set at last pull but isn't on
the platform now, surface that explicitly so the operator notices.

**Outcome you'll notice.** Concurrent dashboard edits no longer
disappear silently. If someone else touched a resource between your
pull and your push, you see the conflict at push time and have to
make an explicit call (overwrite, or pull and resolve). The engine
becomes a real safety rail rather than a blind PATCH machine.

---

Before each PATCH, GET the current platform payload, hash it, and
compare to the lastPulledHash recorded in state (Stack F). If the
hashes differ, the dashboard has drifted away from the version we last
pulled — refuse to push without --overwrite.

Behavior matrix:
- No lastPulledHash (legacy state, first push after Stack F): log
  "drift unknown — proceeding" and continue. Don't block.
- Hashes match: continue silently.
- Hashes differ + no --overwrite: refuse the push, return null.
- Hashes differ + --overwrite: log "overwriting drift" and continue.

Files:
- src/drift.ts (NEW): checkDriftForUpdate(endpoint, state, overwrite).
  GETs platform, strips server-managed fields (id/orgId/createdAt/etc)
  to align hash basis with cleanResource()'s output, sha256 compares.
  Returns DriftCheckResult with reason and message for caller logging.
- src/state-serialize.ts: checkPronunciationDictDrop helper for the
  Cartesia voice-picker case (improvements.md #7) — pure data, safe
  to import in tests.
- src/config.ts: --overwrite flag.
- src/push.ts: drift gate in upsertResourceWithStateRecovery before
  every PATCH. Skipped in dry-run (operator wants to see what would
  happen). Skipped if no baseline.
- tests/drift.test.ts: hash-match → ok, hash-differ-no-overwrite → ok=false,
  hash-differ-overwrite → ok=true, no-baseline → ok=true.

Closes improvements.md #1, #7. Partial #2 (push side caught; pull side
same-file conflict still requires manual resolution).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---

## Update — 11labs `pronunciationDictionaryLocators` array also covered

`checkPronunciationDictDrop` now detects drops in both pronunciation-
dictionary shapes Vapi exposes:

- **11labs** (the documented shape):
  `voice.pronunciationDictionaryLocators[]` — array of
  `{ pronunciationDictionaryId, versionId }`. We warn on N → M shrinks
  (M < N) including N → 0 and array-going-missing.
- **Cartesia** (passthrough — not in Vapi docs but observed):
  `voice.pronunciationDictId` — single string id. Existing 1 → 0
  detection unchanged.

Reference: https://docs.vapi.ai/assistants/pronunciation-dictionaries

Six new test cases pin the 11labs behavior: array clear (1 → 0), shrink
(2 → 1), array-going-missing entirely, no-op when unchanged, no-op when
locators are added (additive growth shouldn't warn), and the defensive
hybrid case where a payload carries both shapes.
@dhruva-reddy dhruva-reddy force-pushed the dhruva-reddy/feat/push-drift-detection branch from 09af67e to 4b8d8b8 Compare May 2, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant