Skip to content

feat(otasim): admin RBAC + otasim_ctl admin CLI#30

Merged
secup merged 1 commit into
mainfrom
feat/otasim-ctl
May 18, 2026
Merged

feat(otasim): admin RBAC + otasim_ctl admin CLI#30
secup merged 1 commit into
mainfrom
feat/otasim-ctl

Conversation

@secup
Copy link
Copy Markdown
Owner

@secup secup commented May 18, 2026

Summary

OTASim previously allowed any authenticated token to call every RPC
including the 6 destructive ones (`SetChannel`, `InjectEffect`,
`CancelEffect`, `CreateSession`, `StartCapture`, `StopCapture`).
Any joined operator could reconfigure the channel mid-QSO. Not safe
for the multi-operator friend-lab deployment.

This adds two-level RBAC and ships an admin CLI for live operations.

Token format (backward compatible — 3-field lines stay valid as operator):
```
alice_tok:ALPHA:Alpha station # implicit operator
bob_tok:BRAVO:Bravo station:operator # explicit operator
admin_tok:ADMIN:operator + admin:admin # admin role
```

Server gates the 6 destructive RPCs behind `requireAdmin(principal)`
→ `PERMISSION_DENIED` with actionable error for operator tokens.
Read-only and audio-path RPCs (RegisterStation / NegotiateAudio /
JoinSession / GetChannel / Health / etc.) remain open to any
authenticated principal.

New CLI `tools/otasim_ctl` (~270 LOC):

  • `health` — server health + message
  • `list-sessions` — active sessions
  • `get-channel` — current channel for --session
  • `set-channel --model M --snr DB [--seed N]` — change channel live

Token via `--token` or `OTASIM_TOKEN` env var. Replaces the previous
"restart daemon to change SNR" workflow.

Test plan

  • `cmake --build build -j4` clean
  • `ctest --test-dir build -R "AuthAllowlist|OtasimServe|UltraGuiOta|UltraTncSimAudio|SessionContext"` → 9/9 pass
  • `test_auth_allowlist` extended for role parsing + unknown-role rejection
  • `test_otasim_serve_smoke` extended: operator token denied on
    StartCapture (PERMISSION_DENIED), then admin token succeeds —
    proves the boundary works
  • Manual smoke: operator denied on set-channel, admin succeeds,
    live SNR change reflected in subsequent get-channel
  • CI: Linux / macOS / Windows full matrix

Followups

  • README + OPERATOR notes for admin token format and `otasim_ctl` usage
  • Owner-of-session model (session creator gets admin for their session
    only) — better multi-tenant story than flat operator/admin. Not in
    this round; flat split is enough for 4-5 friend lab.

🤖 Generated with Claude Code

OTASim previously had a flat auth model: any token in the allowlist
could call every RPC including the 6 destructive ones (SetChannel,
InjectEffect, CancelEffect, CreateSession, StartCapture, StopCapture).
That meant any joined operator could reconfigure the channel mid-QSO,
kill another operator's effect, or start/stop a recording without
permission. Not safe for the multi-operator friend-lab deployment the
design log scoped (~4-5 operators sharing one server).

This change adds a two-level RBAC:

1. Token-file format gains an optional 4th field for role:
     alice_tok:ALPHA:Alpha station                 # implicit operator
     bob_tok:BRAVO:Bravo station:operator          # explicit operator
     admin_tok:ADMIN:operator + admin:admin        # admin role
   Existing 3-field lines stay valid (operator-role by default), so
   prior token files keep working.

2. `AuthPrincipal` gains `bool admin`. Defaults false.

3. `OtaSimulatorService::requireAdmin(principal)` helper returns
   `PERMISSION_DENIED` with an actionable message when an operator-
   role token attempts an admin-only RPC. Wired into each of the 6
   destructive handlers immediately after `authenticate()`.

4. New `tools/otasim_ctl` admin CLI (~270 LOC). Subcommands:
     - health             server health + message
     - list-sessions      active sessions + station counts + channel
     - get-channel        current channel config for --session
     - set-channel        change model + snr (+ optional seed)
   Token via --token or OTASIM_TOKEN env var. Server defaults
   127.0.0.1:50051, session defaults "lobby". Replaces the previous
   workflow of restarting the daemon to change channel settings.

Tests:
- test_auth_allowlist: role parsing (implicit + explicit + admin),
  unknown-role rejection, so a typo can't silently grant admin.
- test_otasim_serve_smoke: asserts an operator token gets
  PERMISSION_DENIED on StartCapture, then uses the admin token
  to actually start/stop the capture. Demonstrates the boundary.

Verified end-to-end on localhost: operator token denied on
set-channel with the expected error, admin token succeeds, server
reflects the new channel config in subsequent get-channel calls.

Future enhancement (not in this round): owner-of-session — session
creator becomes admin for their own private session only. Better
multi-tenant story but more state. For 4-5 friend lab, flat
operator/admin split is enough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@secup secup merged commit 1af83ea into main May 18, 2026
0 of 5 checks passed
@secup secup deleted the feat/otasim-ctl branch May 18, 2026 11:52
secup added a commit that referenced this pull request May 18, 2026
PR #30 added an admin-role gate on CreateSession + SetChannel (and the
other 4 destructive RPCs). The two existing OTASim integration tests
were updated for the new gate (test_otasim_serve_smoke uses an admin
token for capture RPCs, test_auth_allowlist exercises the role
parser) but test_grpc_service_smoke was missed — it calls both
CreateSession and SetChannel with the previously-flat alice_token,
which is now operator-role by default and gets PERMISSION_DENIED.

CI on post-merge main caught it (Linux + macOS + Coverage + Sanitizer
all failed on this one test).

Fix: alice_token gets the explicit :admin role so the smoke flow can
exercise the admin RPCs. bob_token stays operator-only to keep
mirroring a normal joined station.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
secup added a commit that referenced this pull request May 18, 2026
Two-GUI OTASim test at AWGN SNR=20 dB negotiated D8PSK R2/3
(correctly — the rate selector's clean-fading threshold of >=18 dB was
genuinely crossed by the honest idle-noise SNR estimator) and then
decode failed catastrophically: all 8 codewords FAIL with
|llr|_avg ~ 2.7, BRAVO retransmits, ARQ stalls, QSO dies.

Codex's controlled offline sweep (ofdm_snr_probe + decode_bench,
both extended to take --mod and --cw-count) isolated the failure
to the streaming + connected path, not the D8PSK demap/LDPC:

  | SNR | direct probe | connected pre-fix | connected after |
  |  5  | 3/8 (fail)   |        0/4        |       0/4       |
  |  8  | 8/8 (pass)   |        1/4        |       1/4       |
  | 10  | 8/8 (pass)   |        0/4        |       0/4       |
  | 12  | 8/8 (pass)   |        0/4        |       4/4       |
  | 14+ | 8/8 (pass)   |       0-3/4       |       4/4       |

So D8PSK R2/3 PHY closes at AWGN SNR~8 dB (Shannon-limit territory),
but the connected streaming path was broken at every SNR. Root cause
was the multi-candidate light-sync recovery in streaming_ofdm_decode
at line ~1028: DQPSK-tuned retry window (+/-8 samples, partial-CW
acceptance) doesn't handle D8PSK's tighter timing tolerance and
admits low-confidence false locks as success.

This change (Codex round 1):

- D8PSK-only: widen retry window to {-32, -24, -16, -8, +8, +16,
  +24, +32}, prefer earlier candidates first (late light-sync
  locks show up as positive LTS phase slope).
- D8PSK-only: require full fixed-frame decode to accept a retry
  (partial CW success no longer counts), preventing false-positive
  recoveries.
- D8PSK-only: trigger recovery on partial-fixed-frame failures
  (>=2 codewords attempted, partial CW success), not just zero-CW.
- Boundary safety: skip negative deltas that would underflow the
  ring buffer at the start of a stream.
- Non-D8PSK behavior preserved verbatim (+/- 8 deltas, partial
  acceptance, same gating).

Also in this change:

- tools/ofdm_snr_probe.cpp + tools/decode_bench.cpp: --mod and
  --cw-count flags so the controlled sweep is reproducible.
- tools/cli_simulator.cpp: spawned OTASim's tokens now carry the
  admin role. cli_simulator calls SetChannel to configure the
  spawned daemon's channel; PR #30's admin gate denied that with
  the previously-operator-only tokens, breaking CLISyntheticNotch.
  Test harness fully owns its sandbox; production servers should
  not hand out admin tokens this freely.

Test gate (user's unrestricted Mac):
  cmake --build build -j4
  ctest --test-dir build --output-on-failure -j4
  -> 83/83 PASS (after cli_simulator token fix; D8PSK fix doesn't
     regress any existing test on its own).

3-perspective check:
- PHY: D8PSK demapper + LDPC unchanged; only the front-end
  timing-recovery policy was tightened for D8PSK's larger
  amplitude sensitivity at high-modulation index.
- DSP: change is gated on (modulation == D8PSK), so DQPSK
  timing recovery is unchanged. Boundary check on negative
  deltas avoids ring-buffer underflow.
- Operator: live OTASim two-GUI handshake at SNR>=12 dB now
  completes via D8PSK R2/3 instead of timing out in ARQ.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant