Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions .github/workflows/cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,49 @@ jobs:
if: always()
run: docker compose down --volumes --remove-orphans

golden-oracle:
name: Golden oracle (master-only)
runs-on: ubuntu-latest
# Master-only, slow, pre-deploy behavioral regression gate. NOT part of `verify`
# and NOT a PR-required check — it needs the docker-compose services and may run
# for several minutes. It pins the current ingest→stats behavior so a
# behavior-preserving refactor stays green and any drift turns red.
if: github.event_name == 'push' && (github.ref == 'refs/heads/master' || github.ref == 'refs/heads/main')
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v6

- name: Setup pnpm
uses: pnpm/action-setup@v6
with:
run_install: false

- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 25
cache: pnpm

- name: Install dependencies
run: pnpm install --frozen-lockfile

- name: Start integration services
run: |
set -euo pipefail
docker compose up -d postgres rabbitmq minio
timeout 120 bash -c 'until docker compose exec -T postgres pg_isready -U solid -d solid_stats; do sleep 2; done'
timeout 120 bash -c 'until docker compose exec -T rabbitmq rabbitmq-diagnostics -q ping; do sleep 2; done'
timeout 120 bash -c 'until curl -fsS http://127.0.0.1:9000/minio/health/live; do sleep 2; done'
docker compose run --rm minio-create-bucket

- name: Run golden oracle
run: pnpm run test:golden

- name: Stop integration services
if: always()
run: docker compose down --volumes --remove-orphans

contract-diff:
name: Contract diff
runs-on: ubuntu-latest
Expand Down
3 changes: 2 additions & 1 deletion .planning/STATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ See: .planning/PROJECT.md (updated 2026-05-31)
Phase: Parity / Phase 1 — Game-Type-Aware Statistics — COMPLETE (5/5 plans + review fixes)
Plan: 01-01..01-05 done; code-review BLOCK→fixed→APPROVE
Status: Phase implemented + reviewed; landing to master. Migration 0008 (game_type dimension + nullable rotation_id + NULLS NOT DISTINCT + is_show) & 0009 (stale NULL-type cleanup); set-based classification; per-type + all-time recalc; per-type legacy-export/parity-sql; audit path made game-type-correct. pnpm verify green, 100% cov, OpenAPI diff empty. Deferred: large-bucket perf pass (review findings 3/4/5 + parity-driver flag).
Last activity: 2026-06-15Landed quick tasks 260615-u06 (F9 excludePlayers) and 260615-v6m (F5 orphaned-published reconciler) to master via PRs #22/#23; pnpm verify green, 100% cov
Last activity: 2026-06-17Built golden e2e integration oracle (260617-v4e): full ingest→stats chain on real PG/RabbitMQ/S3, characterization snapshots + hand-computed bounty anchors + pinned invariants; master-only `test:golden` outside verify; pnpm verify green, test:golden 26 live tests, 100% cov

## Performance Metrics

Expand Down Expand Up @@ -135,6 +135,7 @@ Decisions are logged in PROJECT.md Key Decisions table. Recent decisions affecti
| 260614-r9k | Guard all-time recalc against NULL replay_timestamp (toISOString crash) | 2026-06-14 | b0275e0 | [260614-r9k-recalc-null-timestamp-guard](./quick/260614-r9k-recalc-null-timestamp-guard/) |
| 260615-u06 | F9 — apply the legacy excludePlayers exclusion to the player leaderboard | 2026-06-15 | b80a235 | [260615-u06-f9-excludeplayers-apply-the-legacy-exclu](./quick/260615-u06-f9-excludeplayers-apply-the-legacy-exclu/) |
| 260615-v6m | F5 — reconciler re-queues orphaned `published` parse_jobs (self-healing ingest) | 2026-06-15 | f4e0c1b | [260615-v6m-f5-reconciler-for-orphaned-published-par](./quick/260615-v6m-f5-reconciler-for-orphaned-published-par/) |
| 260617-v4e | Golden e2e integration oracle — pins ingest→stats pipeline behavior (real PG/RabbitMQ/S3) before the Phase 2 refactor; master-only `test:golden`, not in verify | 2026-06-17 | 7a93295 | [260617-v4e-golden-e2e-integration-oracle-for-ingest](./quick/260617-v4e-golden-e2e-integration-oracle-for-ingest/) |

## Deferred Items

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Quick Task 260617-v4e: Golden e2e integration oracle — Context

**Gathered:** 2026-06-17
**Status:** Ready for planning
**Full rationale:** see `DEEP-BRAINSTORM.md` in this directory (the locked decision pack from a deep
Socratic brainstorm). This CONTEXT is the lean digest — decisions here are LOCKED, do not re-litigate.

<domain>
## Task Boundary

Build golden end-to-end integration test(s) that pin the **current observable behavior** of the
`server-2` ingest→stats pipeline (plus the public read surface) as a **behavioral regression oracle**
BEFORE the upcoming **Phase 2 Track C refactor** (Oxfmt mass-reformat + full Oxlint + `tsc`→tsdown
2-entry + depcruise/knip/lefthook — explicitly behavior-preserving). The oracle catches integration-level
drift that the unit suite (mocked boundaries) and the frozen-contract/oasdiff gate (API shape only) miss.

Convention-bound test work — author it THROUGH `solidstats-server-ts-tests` (+ shared testing standards),
citing the rules relied on. Do NOT hand-roll.
</domain>

<decisions>
## Implementation Decisions (LOCKED)

### Scope — full chain + read API
One golden test drives the real production path per real artifact:
`IntervalTask.runOnce()` → `IngestPromotionService.promotePending()` (`src/modules/ingest/service.ts`)
→ durable `parse_jobs` row + RabbitMQ publish (`src/modules/ingest/publisher.ts`)
→ real broker delivery → `ParseCompletedMessage` consumer (`src/infra/queue/rabbitmq.ts`)
→ real S3 artifact load (`artifactLoader.loadParserArtifact({bucket,key})`, `src/modules/ingest/runtime.ts:94`)
→ `recordParserCompleted()` (`src/modules/ingest/repository/repository.ts:525`)
→ `ParserResultRecalculationService.recalculateParserResult()` (`src/modules/statistics/service/recalculation.ts`)
→ assert via `GET /stats/*` (`src/modules/public-stats/...`).

### Realism — real PG + real RabbitMQ + real S3 (no mocked boundary)
Mirror the existing harness: docker-compose services on fixed localhost ports (PG `15432`, Rabbit `5673`,
S3 `9000`, env-overridable), real schema via `runMigrations()`, `truncate … cascade` isolation,
**unique S3 keys + ephemeral queue per run**. A mock at a contract boundary hides the exact failures the
oracle exists to catch (brief anti-pattern #1). Drive promotion via `IntervalTask.runOnce()` (no real
timer — principle 9); await parse-completed via a **bounded DB-state poll** (the consumer exposes no
completion Promise; the test may run long).

### Fixtures — hundreds of REAL artifacts, committed as a gzip archive, unpacked at test start
Real `ParserArtifact` JSONs (the shape `server-2` ingests — `src/modules/statistics/parser-artifact.ts`,
matches parser-2 `parse-artifact-v3.schema.json`). Stored as ONE committed gzip archive in-tree, unpacked
at test start, iterated with `test.each`. **Capture is gated** (agent lacks VPS access): a deterministic
**capture script pulls the real production artifacts from the VPS over SSH** (the actual objects prod
ingested) and packs the archive — human runs it once under `!`. Note: Happ VPN is always-on; SSH to own
VPS needs the `ip rule` bypass or it hangs (global memory `happ-vpn-bypass-for-servers`). Local fallback
floor = the ~10–13 `replay-parser-2` golden inputs parsed via its CLI, committed so the oracle is never
empty. The test **guards on archive presence and skips cleanly** when absent (principle 8).

### Assertions — characterization snapshots + bounty anchor
Golden snapshots of the FULL observable surface (`parser_results` + all evidence fields, `parser_events`,
`player_stats`, `squad_stats`, `commander_side_stats`, `bounty_points`, terminal `parse_jobs`,
`ingest_staging_records` status/evidence, and `GET /stats/*` responses) with **deterministic
normalization** (UUID→stable natural key by checksum/nickname/replay, timestamps redacted, rows sorted),
PLUS hand-computed bounty assertions on 2–3 anchor cases (bounty values are business-critical — check
semantics, not only snapshot equality). Pin CURRENT behavior as-is; if a pinned behavior is known
tech-debt, comment it + point to backlog — do NOT "fix" inside the oracle (principle 7).

### Invariants / idempotency to pin (from current code, as-is)
- Durable `parse_jobs` row exists **before** the RabbitMQ publish (never fire-and-forget).
- Re-promote same staging row → dedup/no-op: `status='promoted'` + `promotion_evidence.duplicate_replay_id`.
- Same `source_system`+`source_replay_id`, different bytes/checksum → `status='conflicted'` +
`conflict_details.reason='source_identity_changed_bytes'` (`service.ts:147`).
- Checksum-duplicate (no source match) → `status='promoted'` + duplicate evidence appended (`service.ts:166`).
- Re-deliver same `parse.completed` → terminal state recorded once.
- Auth/role gate (flow 4): a protected route rejects without role / accepts with role via the shared
`requireRole`/`requireAnyRole` pre-handlers (`src/modules/auth/routes/authorization.ts`).

### Gate placement — master-only, slow, separate from `verify`
Dedicated script (e.g. `test:golden`) + a **master-only pre-deploy CI job**. NOT in `verify` and NOT in
`test:coverage` → zero coverage obligation (principle 10); `verify` stays green at 100% without the
archive (principle 8). The test MAY run long — that is accepted and intended.

### Cross-app boundary (from replay-parser-2 decision pack — respect it)
The parser does NOT calculate bounty. The parser emits compact kill/stat facts; **`server-2` computes
final bounty from previous-rotation effectiveness + cross-replay state**. Consequence for fixtures: a
single-artifact run yields meaningful bounty ONLY if a **previous rotation with known effectiveness is
seeded**. The bounty anchor cases MUST set up the previous-rotation state. CORRECTION (RESEARCH §1):
server-2 does NOT verify artifact bytes on ingest — `loadParserArtifact` is plain `JSON.parse`, no schema
or checksum gate; `artifact_checksum`/`source_checksum` are stored as metadata only and need not match the
bytes (byte-verification is parser-2's job). A fixture needs only a well-formed `^[0-9a-f]{64}$` checksum.

### Out of scope (non-goals)
- request/moderation **business-logic workflow** (Phase 2 rewrites it → pinning = false reds). Only the
role-gate mechanism is in scope.
- NOT wired into fast `verify`/`test:coverage`; no coverage obligation.
- NO fresh-schema/bucket/db per test — repo convention is `truncate … cascade` (Step 0: repo overrides
the generic brief).
- NOT a parity/value-vs-legacy comparison (that is the cutover diff harness). Pins `server-2`'s OWN
current behavior.
</decisions>

<specifics>
## Specific Ideas

- Harness divergence already documented in `.planning/codebase/TESTING.md`: integration suite connects to
**docker-compose** services, NOT programmatic testcontainers. Follow that, not the brief's "testcontainers".
- Existing references to mirror for wiring: `src/test/integration/adapters.test.ts` (real PG+Rabbit+S3
health), `src/modules/ingest/repository/tests/postgres.test.ts` (real `IngestPromotionService` +
`PgIngestRepository` + Postgres, reuses seed helpers).
- Extract ONE shared fixture-loader/unpacker and ONE snapshot-normalizer; reuse the production schema/types
(never a hand-mirrored copy) — principle 9.
- `verify` for the plan's tasks must rely on typecheck/lint + unit + the golden test **skipping cleanly**
when Docker/the archive are absent (live run is CI/master-only). Docker is frequently unavailable in the
local dev env — the golden test and its `verify` step must tolerate that.
</specifics>

<canonical_refs>
## Canonical References

- `DEEP-BRAINSTORM.md` (this directory) — full decision pack, question ledger, risks, acceptance criteria.
- `/tmp/golden-integration-test-prompt.md` — the reusable source brief (server-2 section is ground truth;
read its "real call path", "durable-job invariant", "high-value golden flows", anti-patterns).
- `.planning/codebase/TESTING.md` — the repo's actual testing reality (harness, coverage gate).
- Skills: `solidstats-server-ts-tests` (harness, per-layer map, coverage), `solidstats-server-ts-conventions`,
`solidstats-shared-testing-standards`, `solidstats-shared-project-standards`.
- parser-2 `schemas/parse-artifact-v3.schema.json` — the cross-app artifact contract.
</canonical_refs>
Loading