Trusted-peer, membership-gated, crash-safe, write-authorized KB sync + attention bus (ADR-017/020/022/023/024)#69
Draft
cuttlefisch wants to merge 193 commits into
Draft
Trusted-peer, membership-gated, crash-safe, write-authorized KB sync + attention bus (ADR-017/020/022/023/024)#69cuttlefisch wants to merge 193 commits into
cuttlefisch wants to merge 193 commits into
Conversation
The 0.13.11 and 0.13.12 version bumps updated Cargo.toml but not the workspace member versions in Cargo.lock (earlier bumps had explicit 'sync Cargo.lock' chores; these two missed it). A plain cargo build regenerates these, dirtying the tree — sync them once so both dev machines start from a clean working tree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add shared/mcp/src/keystore.rs: a permission-guarded trusted_keys file (default $XDG_DATA_HOME/mae/collab/trusted_keys, 0600) holding symmetric PSKs out of config.toml. Format: '[name] <secret>' per line, # comments. Both editor and daemon read it via mae-mcp so path + format live in one place. Extend PskAuth to be multi-key on the server side: it can trust a SET of named keys (a keystore) and select the one a client advertises via a new optional key_id in the auth hello. Backward compatible — unnamed clients use the server's default (first) key; serde ignores the absent/extra field so old and new peers interoperate. Proof verification now uses constant-time Mac::verify_slice instead of string compare. Foundation only; daemon + editor wiring follow. mae-mcp: 100 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Daemon: in psk mode, build the trusted set from the keystore (every entry is a peer credential) plus legacy psk/psk_command (one unnamed key), and construct a single shared multi-key PskAuth. Add 'mae-daemon keygen [name]' (random 0600 key, printed for copying to peers) and 'mae-daemon keys' (names + fingerprints, never secrets). check-config/doctor now report the keystore path + key count and warn on loose perms. Editor: resolve the client credential via resolve_client_credential() — precedence psk_command > psk > keystore primary key — and advertise the key's name as the wire key_id so the daemon selects it. Pure resolver is unit-tested; the keystore lookup no longer makes the empty-psk test flaky. Verified end-to-end: a client with only a keystore key connects to a psk daemon — 'PSK auth succeeded key=client-cli'. Closes the gap where a PSK could only come from config.toml (which is being retired). mae-mcp 100, collab_bridge 84, daemon config tests pass; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Design for an asymmetric collab auth mode ('key') alongside none/psk:
Ed25519 keypairs, known_hosts (client pins daemon) + authorized_keys
(daemon trusts clients), mutual signed-challenge handshake, client TOFU
policy (prompt/accept-new/strict), daemon pending-approval + admin CLI
(identity/authorized/pending/authorize/revoke). Enables trust-on-first-use
and per-peer revocation without shared-secret rotation. Symmetric keystore
(this branch) remains as 'psk' mode.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ADR-017 phase 1) shared/mcp/src/identity.rs: Ed25519 Identity (load_or_generate, 0600 private key), PublicKey (base64 wire form, SSH-style 'mae-ed25519 <b64> <label>' lines, SHA256: fingerprints), KnownHosts (client pins daemon keys), AuthorizedKeys (daemon trusts client keys, add/authorize/revoke), and a HostKeyVerifier abstraction with a known_hosts-backed FileHostKeyVerifier implementing the accept-new / strict / prompt TOFU policies (pins on first use, aborts on a changed host key). shared/mcp/src/auth.rs: KeyAuth AuthProvider — a mutual signed-challenge handshake binding both pubkeys + nonces into a domain-separated transcript. Server verifies the client signature and checks authorized_keys; client verifies the server signature and applies the host-key policy before proving its own key. Adds ed25519-dalek + base64 deps. Crypto core only; daemon + editor wiring + TOFU UI follow. mae-mcp: 113 tests pass (13 new for identity/keyauth), clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add CollabAuth enum (None/Psk/Key); in 'key' mode the daemon loads/generates its Ed25519 identity and an authorized_keys trust store, and runs KeyAuth::server per connection. check_collab accepts 'key' and flags an empty authorized_keys. New admin CLI: mae-daemon identity show the daemon pubkey line + fingerprint mae-daemon authorized list trusted client keys (label + fingerprint) mae-daemon authorize <pubkey> add a client pubkey line to authorized_keys mae-daemon revoke <label> remove client key(s) by label check-config/doctor report the identity fingerprint + authorized key count. Verified: identity → authorize → check-config OK. clippy -D warnings clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…phase 0)
Add shared/mcp/src/tls.rs: mutual TLS where each peer presents a self-signed
X.509 cert whose SPKI is its existing Ed25519 Identity key. TLS 1.3 gives
confidentiality + proof-of-possession; peer trust moves into custom verifiers:
the daemon checks the client cert's pubkey against AuthorizedKeys, the editor
TOFU-pins the daemon cert's pubkey via HostKeyVerifier. This unifies encryption
+ mutual auth + pinning on the identities we already manage, superseding the
JSON KeyAuth handshake on the TLS path.
- ring crypto backend with an explicit CryptoProvider (avoids clashing with the
editor's reqwest aws-lc-rs default). Daemon gains rustls for the first time and
builds cleanly (ring only, no aws-lc-rs/cmake conflict with cozo).
- ed25519_pubkey_from_cert (x509-parser, OID 1.3.101.112) is the trust-critical
extraction — round-trip tested against our own cert.
- PeerIdentity {label,fingerprint,pubkey} added to identity.rs (authoritative
identity for strict binding); Identity::pkcs8_der; Debug on HostKeyVerifier.
mae-mcp: 119 tests (6 new incl. full in-process mTLS handshake: authorized
client succeeds + identity recovered, unauthorized rejected, untrusted host
rejected). clippy clean; both workspaces build.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…phase 1)
Daemon: add CollabAuth::KeyTls (built for mode=key + tls=true, default).
The accept loop wraps the whole TcpStream with the rustls TlsAcceptor (not
pre-split), recovers the verified PeerIdentity via peer_identity_from_tls, then
splits the TlsStream and runs the session. Plaintext psk/legacy-key/none paths
unchanged. AuthConfig gains tls: bool (default true); check-config shows it.
Session plumbing: ClientSession gains peer_identity + with_identity() +
authenticated_label(); collab_handler refactored so handle_client (anon) and
the new handle_client_authenticated(peer) share run_session(). handle_client_with_auth
(psk/legacy-key) now synthesizes a PeerIdentity from the auth label and routes
through it — the authenticated label finally reaches the session instead of being
dropped. mae-mcp re-exports tokio_rustls::{TlsAcceptor,TlsConnector}.
Regression-safe: 36 daemon collab_e2e tests pass; clippy -D warnings clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ase 2a) Add three Scheme-configurable options (OptionRegistry + get/set + validation): - collab_auth_mode (none|psk|key) — selects the handshake; key = Ed25519 trusted-peer identity over mTLS. - collab_host_key_policy (prompt|accept-new|strict) — TOFU policy for an unknown daemon identity. - collab_tls (default true) — mTLS vs plaintext JSON KeyAuth fallback. CollabState gains the fields (defaults psk/prompt/true). config.toml wiring intentionally omitted (config.toml is retiring; set via init.scm / :set). Add 'mae --collab-identity': prints this editor's Ed25519 peer identity (generating it on first use) + the exact 'mae-daemon authorize' line, so an admin can authorize the peer. Label = hostname. Transport wiring to actually use key mode follows in 2b. mae-core option tests +2; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hase 2b)
Refactor the editor connection so it can speak mTLS. run_collab_task is now
generic over the stream: a ClientTransport{Plain,KeyJson,KeyTls} enum (resolved
once from collab_auth_mode/tls/host_key_policy) drives a single
establish_connection() helper; read/write halves are type-erased (Box<dyn
AsyncBufRead/AsyncWrite>) so TCP and TLS share one loop. spawn_reader_task is
generic; the three connect sites (Connect/StartServer/reconnect) route through
establish_connection, skipping the PSK handshake on the TLS path. In key mode
the editor loads its Ed25519 identity + a known_hosts FileHostKeyVerifier
(TOFU policy) and connects via tls::client_config; KeyJson is the tls=false
fallback. mae-mcp re-exports ServerName.
E2E: scripts/collab-mtls-e2e.sh (make test-collab-mtls-e2e) spins up a real
key+tls daemon, authorizes the editor identity, and runs a real editor over
mTLS — connect, share a buffer, daemon confirms the share. Verified 7/7 green;
daemon authenticates the peer 'framework' by cert (strict binding visible).
collab_bridge 84 tests pass; clippy clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The authenticated peer identity (from mTLS, or the JSON-handshake label) is now
authoritative for attribution. Thread auth_label through run_session into the
doc handlers (handle_doc_*_inner; thin #[cfg(test)] wrappers keep the 28 existing
handler tests untouched) and enforce:
- kb/share: a key/TLS-authenticated peer that claims a creator other than its
verified identity is REJECTED ('creator mismatch'); the authenticated label is
the authoritative creator. Anonymous (psk/none) sessions keep self-claimed
values (backward compatible).
- sync/awareness: broadcast user_name (cursor label) overridden with the
authenticated label — cursor labels can't be spoofed.
- docs/save_committed: saved_by overridden with the authenticated label.
Closes the spoofable-creator gap. 3 new unit tests (spoofed rejected, matching
allowed, anonymous preserved); daemon 76 lib + 36 e2e tests pass; clippy clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Least-privilege access to shared KBs among trusted peers. An authenticated
(key/TLS) peer may join/update a KB only if it is the creator or in the KB's
KbCollectionDoc.members(); anonymous (psk/none) sessions keep connection-level
trust (backward compatible).
- kb_membership_check gates kb/join and kb/node_update.
- New owner-only methods kb/add_member / kb/remove_member {kb_id, member}:
verify the caller is the collection creator, apply add/remove via the
collection CRDT, persist + broadcast the update.
- Residual limitation (documented): a member could still smuggle membership
edits through a raw kbc: sync/update; server-side CRDT field ACLs are future
work. The sanctioned path is the owner-only methods.
4 well-designed unit tests: creator joins / non-member denied; owner add→join
→update, remove→denied; only-owner-manages; anonymous-not-gated. daemon 80 lib
+ 36 e2e tests pass; clippy clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire :kb-member-add / :kb-member-remove <kb-id> <member> end to end: CollabIntent::KbAddMember/KbRemoveMember (dispatch_collab parses args from the ex-command line) → CollabCommand::KbMember → run_collab_task sends kb/add_member /kb/remove_member RPC (PendingResponseKind::KbMember) → response becomes a status line, or a CollabEvent::Error on denial (e.g. 'only the owner can manage members'). Disconnected handler reports not-connected. 3 dispatch unit tests (args→intent, both add/remove, missing-args→no-intent); editor 90 collab + core collab tests pass; clippy clean. The daemon enforcement this drives is covered by the 4 membership unit tests in phase 4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or membership e2e Bug caught by the two-editor e2e: strict binding overrode the kb/share creator VARIABLE but not the collection doc's internal creator()/members() (set by the client from its user_name). So the owner-check in kb/add_member failed (coll.creator() != authenticated label), the add was silently rejected, and a newly-added member was still denied. Fix: KbCollectionDoc::set_creator re-stamps the creator + seeds it as a member; the daemon calls it on kb/share for authenticated sessions, binding the shared collection to the verified peer identity. scripts/collab-membership-e2e.sh (make test-collab-membership-e2e): two real editors over mTLS — alice shares, bob denied (not a member), alice adds bob, bob joins. Oracle = daemon log. VERIFIED PASS. + set_creator unit test. mae-sync 144, daemon 80 tests; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When collab_host_key_policy=prompt and the editor meets an unknown daemon identity, a PromptingHostKeyVerifier emits CollabEvent::HostKeyPrompt and BLOCKS the connection task on a std reply channel; the main (UI) thread shows a 'Trust Daemon Key? <fingerprint> [y/N]' MiniDialog (MiniDialogContext::PeerKeyAccept), and the y/n answer is routed back (Editor.pending_host_key_reply) to pin (accept) or abort (reject). The collab task runs on a separate thread from the winit/TUI loop, so the block is safe; a 120s timeout rejects if unanswered. A previously pinned key that matches is accepted silently; a CHANGED key aborts (MITM). accept-new/strict keep the non-interactive file verifier (headless default). 4 unit tests cover the channel round-trip + pinning. mae 88 collab + mae-core 15 dialog tests pass; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix confirmed inaccuracies that misled users: - daemon CLI: removed nonexistent --unix-socket/--db/--wal-threshold; documented the real flags (--bind/--config/--data-dir/--check-config) + the keygen/keys/ identity/authorized/authorize/revoke subcommands. - env var MAE_COLLAB_ADDR → MAE_COLLAB_SERVER. - editor options: point at init.scm (config.toml retiring), correct defaults (collab-server-address 127.0.0.1:9473, collab-user-name not -username, backoff 2), add collab-auth-mode/host-key-policy/tls. - WAL recovery path collab.db → collab/state.db. Add §10 Trusted-Peer Mode: Ed25519 mTLS setup end to end (daemon identity → authorize peer → editor key mode + TOFU → per-KB membership commands), and update §8 Security for the three auth modes (none/psk/key) + mTLS shipped. ADR-017 → Accepted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The e2e job now also pulls the mae-daemon release artifact (needs: [check, daemon]) and runs scripts/collab-mtls-e2e.sh + scripts/collab-membership-e2e.sh against the real release binaries — exercising the full trusted-peer stack (Ed25519 mTLS handshake, TOFU, strict identity binding, per-KB membership) headlessly. Adds iproute2 (the scripts use ss for port readiness). Verified both pass with release binaries locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…live) Consolidated step-by-step plan: Tier 0 automated (unit + e2e + CI commands), Tier 1 single-host CLI smoke, Tier 2 the two-machine live run (daemon+editor on D, editor on E) covering identity exchange/authorize, TOFU connect, buffer convergence + authenticated cursor labels, KB membership (deny→add→allow→remove), and security/negative checks (unauthorized peer, changed host key, tcpdump confidentiality). Results checklist + troubleshooting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepend a Setup section: Rust >=1.95 (MSRV), iproute2 for the e2e scripts, optional GUI build deps; build both workspaces (make build-tui + build-daemon); get binaries on PATH (install targets or copy). Plus a key-setup table clarifying that the automated e2e scripts generate+authorize their own keys, while the manual tiers need you to exchange identities + mae-daemon authorize (Tier 2 Step 3), and where identities live + how to reset them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One-command, idempotent key-mode setup: `mae setup-collab [--server <addr>]` generates the peer identity (if absent), persists collab-auth-mode=key + server + auto-connect to init.scm (via the existing save_option_to_init), and prints the exact `mae-daemon authorize` line. Re-running updates in place (no duplicates). SSH integration (opt-in key reuse for an SSH-like purpose): - Editor: `mae setup-collab --ssh-key ~/.ssh/id_ed25519` imports an unencrypted OpenSSH Ed25519 PRIVATE key as the collab identity (Identity::import_ssh_private_key via the ssh-key crate; from_seed/save helpers). - Daemon: `mae-daemon authorize --from-ssh-pub <file> <label>` imports the SSH PUBLIC key (PublicKey::from_ssh_line — manual SSH wire parse, no dep). - Verified consistent end-to-end: the editor's imported MAE fingerprint EQUALS the daemon's authorized fingerprint, so the editor presents exactly the key the daemon trusts. Errors clearly on encrypted/non-ed25519 keys. Note: reusing one key across SSH + MAE couples their compromise; a dedicated MAE identity (default) keeps them separate — documented in COLLABORATION.md §10. 3 new mae-mcp tests (ssh pubkey roundtrip, ssh private import matches pubkey, + existing). mae-mcp 121, daemon 80; clippy clean. Docs + testing plan updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…against 0.0.0.0 The two-machine testing plan used the default collab port 9473, which collides with an already-running personal daemon (binds 127.0.0.1:9473; a test daemon on 0.0.0.0:9473 overlaps loopback). Switch Tier 2 to a non-default port (9480) with an explicit "check it's free first" note, and document bind-vs-connect (0.0.0.0 is a bind address, never a connect target). - scripts: collab-mtls-e2e.sh / collab-membership-e2e.sh now auto-select the first free port (scan upward from 9476/9477 via `ss`) unless MAE_E2E_PORT is set explicitly — so a running daemon or a concurrent test run never triggers "address already in use". Loopback-bound, so they never touched 9473 anyway; this just makes them robust against any busy port. - mae setup-collab: reject `--server 0.0.0.0:…` with a clear message — that's the daemon's bind address, not a reachable connect target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The collab e2e harness was Linux-only and silently mis-ran on macOS,
forcing stop-and-go debugging across our two dev machines. Three issues,
one root theme — platform-divergent path/tool resolution:
1. Daemon dir resolution ignored XDG on macOS.
`daemon/src/config.rs` resolved config + data dirs via bare
`dirs::config_dir()` / `dirs::data_dir()`, which follow Apple
conventions on macOS (`~/Library/Application Support`) and ignore
`XDG_CONFIG_HOME` / `XDG_DATA_HOME`. The e2e scripts isolate each peer
via those env vars, so on macOS the daemon never found its generated
`daemon.toml`, fell back to all defaults (default bind :9473, default
`$TMPDIR/mae-daemon.sock`), and collided with the developer's personal
daemon — "daemon failed to listen". Meanwhile the *identity*/*keystore*
code (mae-mcp) already resolves XDG-first, so identities landed in the
isolated dir while config/data landed in the real Library dir (split
brain). Fix: resolve config + data dirs XDG-first on all platforms
(env when set, else `dirs::*`), matching `mae-mcp::identity` /
`keystore`. Pure extension — macOS users without XDG set are unchanged.
2. Port-readiness probe used `ss` (Linux iproute2), absent on macOS, so
the daemon-listening check always failed even when it was up. Add a
portable `port_listening` helper: prefer `ss` (Linux/CI unchanged),
then `lsof`, then `netstat`.
3. The editor run was wrapped in `timeout`, absent on stock macOS. Use a
`${TIMEOUT_BIN:+...}` prefix resolving `timeout` → `gtimeout` →
omitted (bash 3.2-safe, `set -u`-safe).
Codify the lesson as CLAUDE.md principle #13 (cross-platform parity):
XDG-first dirs everywhere, portable shell tooling, CI on both OSes — a
fix that only works on one machine is not a fix.
Verified on macOS (was failing, now passing):
- scripts/collab-mtls-e2e.sh ............ 7/7, mTLS peer authenticated
- scripts/collab-membership-e2e.sh ...... 7/7 + 7/7, deny→add→allow
- cargo test -p mae-mcp ................. 121 passed
- cd daemon && cargo test / clippy ...... 9 passed / clean
- cargo test -p mae --bins collab ....... 94 passed
Linux behavior unchanged (ss/timeout still preferred; XDG already worked).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a shared scratchpad to the testing plan so D (driver) and E (mac) can start the Tier 2 live run the moment D is up — no round-trips. Captures the concrete session state instead of the reference topology: - E (bob, mac, 192.168.1.132) is READY: built from a8ac842, personal daemon stopped (9473 clear), identity generated — fingerprint + pre-formatted `mae-daemon authorize ... bob` line for D to paste. - D's row is a fill-in (IP, fingerprint, status) the driver commits back. - Test port 9480 (avoids the personal-daemon :9473 collision). - mDNS returned nothing on this LAN → connect by explicit host:port. - D's unblock checklist (pull a8ac842, bind 0.0.0.0:9480 key-mode, authorize bob, publish fingerprint, open firewall). Each machine edits its own row, commits, pushes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…th machines (#66) The live two-machine run surfaced issue #66: the interactive TOFU `prompt` policy (what `setup-collab` writes by default) is unwired and freezes the editor — hard-freezes the TUI, silently fails the GUI. It bites EVERY editor, including D's own "alice" (which connects to D's daemon too), so it's a coordination hazard, not just a local quirk. Update the testing plan so the other machine doesn't trip on it: - Prominent #66 callout: every editor must set collab_host_key_policy = "accept-new" in init.scm (non-blocking, auto-pins) until #66 is fixed; verify the daemon fingerprint OUT-OF-BAND against the pinned known_hosts entry instead of via the (broken) prompt. - Board: min build bumped to b947a52; added an `accept-new set` column; D's checklist now says rebuild BOTH binaries (branch moved past the first harness build) and configure accept-new before launching alice. - Step 4 rewritten for accept-new + the out-of-band pin check; the interactive prompt path is marked deferred to #66. - Results checklist: T0 marked green (macOS), row 4 split into accept-new (now) and prompt-TOFU (deferred). No code change — config/docs only. Tier 0 already validated on macOS at b947a52. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Track machine-E observations during the two-machine ADR-017 validation so we surface + fix issues, and D sees our findings. Logged so far: - resolved: cross-platform Tier 0 fix (a8ac842) - filed: #66 (TUI TOFU prompt deadlock) - open/HIGH: alice rope panic crash (D-side, suspect shared/sync rope bridge) - open: bob local edits to a joined buffer not visible on read-back (2x; cause TBD) - open: connection flapping (peer closed w/o TLS close_notify) — correlated w/ alice crash Convergence so far: alice->bob receive confirmed; round-trip not yet validated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…test step Per the review-the-process feedback: each entry now carries its tier/step (T0, T2.4, T2.5, …), Action → Expected → Actual → Status → Repro, so issues are pinpointed to the code path under stress and are reproducible. - Run 1 chronological table (10 rows) mapping each success/failure to a step. - Issue details I-1 (alice rope panic @ T2.5, task #18), I-2 (bob edit not visible @ T2.5), I-7 (connection flapping @ T2.4/5), #66 (TOFU @ T2.4). - Convergence scorecard by direction+step; next-run-from-scratch checklist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ef bob Aligns naming with collab-test-notes-bob.md. Logs the run-1 progress (cross-machine mTLS auth ✅, alice→bob receive ✅) and the I-1 alice rope panic at T2.5: Rope::char(138) OOB on bob's remote edit of an em-dash line. Scopes it to the editor-side apply-remote path in crates/core (text.rs bridge + local cursor adjust are already clamped). D owns the backtrace + fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ackward (I-1)
The two-machine collab run crashed alice's GUI with a ropey panic
("index past end of Rope: char index 138, Rope char length 34"). Backtrace:
Rope::char <- word::word_start_backward <- mouse_ops::handle_mouse_click_inner
Not a CRDT bug (headless convergence never crashed) — a mouse bug. Clicking the
right pane of a vertical split registers as a double-click word-select, and the
screen column (~138) far overruns the short line. The double-click path passed an
unclamped text_col to char_offset_at (the single-click path already clamps), and
word_start_backward guarded pos==0 but not pos>len_chars (word_end_forward
already guards), so rope.char(137) on a 34-char rope panicked.
Fixes:
- word::word_start_backward clamps pos.min(len_chars()) (defense in depth).
- mouse_ops double-click path clamps text_col to the clicked line length before
char_offset_at (also guards the link-follow branch).
Tests: word_motions_clamp_out_of_bounds_pos, word_start_backward_out_of_bounds_on_empty_rope,
mouse_double_click_past_line_end_does_not_panic. Full mae-core suite 2237/2237.
Follow-up (I-3): the fallback handle_mouse_click uses raw (non-window-relative)
coords in a split — now safe (clamped) but cursor lands at line end; make it
window-relative later.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…resolved After fix a57455f, clean run from scratch (T2.4→T2.5): - alice→bob and bob→alice convergence both confirmed over mTLS, two machines. - I-1 (rope panic) FIXED + verified live — root cause was double-click word-select in a split pane passing an OOB offset to word_start_backward, NOT the CRDT path (multibyte was a red herring). No crash in Run 2. - I-2 (bob edit "not visible") RESOLVED as a driving artifact: MCP active buffer was *AI:claude*; switch-to-buffer must be its own verified step. - I-7 (flapping) RESOLVED — it was a symptom of alice crashing (I-1), gone now. Next: simultaneous-edit, then T2.6 KB membership, T2.7 security checks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Records the post-fix live run: bidirectional CRDT convergence confirmed (bob's line + alice's seed + alice's typed line all merged; 52 session-7 + 1 session-8 updates), and the I-1 fix verified live (double-click @ col 138 in a split no longer crashes). Reattributes bob's I-2 to an MCP eval_scheme artifact (buffer- insert via eval skips the event-loop post-edit collab flush; real keystrokes sync fine). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… connect path Bob's 9d run surfaced B-21: a runtime `:set collab-host-key-policy prompt` (or `(set-option! …)`) updated the option (get_option reflected "prompt") but the connect still auto-pinned under the init.scm value (accept-new) — so the TOFU modal never appeared. Root cause: `resolve_client_transport` builds the host-key verifier ONCE in `setup_collab_channels` (startup) and caches it in `CollabSpawn.transport`; every `:collab-connect` reuses that cached verifier, so a runtime policy change never reached it. Same class as the auto-connect env gap fixed in 91a5201. Fix (editor-side): the verifier now reads a LIVE policy cell at verify-time. - CollabState gains `host_key_policy_live: Arc<Mutex<String>>`, a cross-thread mirror of `host_key_policy`; set_option keeps it in sync; resolve_client_transport seeds it from the current value at setup. - The editor now ALWAYS uses the prompting verifier (the only one that *can* prompt), made policy-dynamic: it reads the live cell each verify and dispatches accept-new → pin, strict → reject, prompt → ask. So a runtime switch to/from prompt takes effect on the NEXT connect with no relaunch. Regression: `host_key_policy_change_honored_at_verify_time_b21` — one verifier instance pins silently under accept-new, then (live cell flipped to prompt) ASKS on a new host instead of auto-pinning. 4 existing verifier tests updated for the new field. mae-core 2274 + mae collab_bridge 95 green; clippy -D warnings clean. Unblocks 9d: bob can now `:set collab-host-key-policy prompt` at runtime + connect → the R4 TOFU modal (GUI under prompt — the #66 deadlock path) — no init.scm relaunch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(rebuild + runtime :set) Path B chosen. Notes bob: this fix is editor-side so he MUST rebuild/reinstall/relaunch (unlike B-20). Then run 9d via runtime (set-option! collab_host_key_policy "prompt") — now honored — clear the pin, connect → expect the R4 TOFU modal (GUI under prompt, the #66 deadlock path). n-then-y, OOB fingerprint SHA256:07aW…7Ls. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…22 GUI TOFU modal render/focus bug B-21 closed: runtime set-option collab_host_key_policy=prompt now honored — connect BLOCKED on the prompt (no auto-pin) and raised a bus notification with the correct fingerprint 07aW…7Ls (OOB match). Reject path correct: notify_resolve(dismiss) -> handshake aborted, status off, known_hosts NOT pinned. B-22 (new, GUI): the R4 TOFU modal is invisible AND unresponsive — (1) no repaint on raise (GUI only redraws on keypress, ~2-key lag); (2) no input-focus capture (keys leak to the underlying buffer — Esc triggered Claude commands with the AI buffer focused). GUI sibling of #66; R4 fixed the plumbing but not the GUI render/focus path. Round-2 accept UN-TESTABLE: bus notification exposed actions:[] (only dismiss=reject), no MCP accept lever, and the modal y/Enter can't be delivered through the broken GUI. Fix dirs: BlockingReply raise must request redraw/damage; modal must grab input focus; add explicit bus actions (Accept&pin / Reject) for headless/Notifications parity. bob restored via accept-new -> auto-pin (07aW…7Ls) -> connected + reconcile-joined; temp backups removed. 9d verdict: B-21 + fingerprint + reject ✅; accept-via-UI blocked by B-22. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e + focus) Bob's 9d run on the B-21 build proved the prompt PLUMBING works (correct fingerprint, reject logic, no-pin-on-reject) but the GUI modal surface was broken: it never drew (GUI froze until a keypress, ~2 behind) and keys leaked to the underlying buffer. Two independent defects, both fixed here: B-22a — runtime starvation (no repaint on raise): the GUI bridge ran on a `new_current_thread` tokio runtime hosting BOTH the collab connection task and the `bridge_task` proxy forwarder (+ AI/LSP/DAP/MCP). The host-key verifier is called synchronously by rustls mid-handshake and blocks (up to 120s) on `reply_rx.recv_timeout` waiting for the prompt answer — starving that one worker so the `HostKeyPrompt` event never reached the GUI and `mark_full_redraw` never ran (the GUI twin of the #66 TUI deadlock). Fix: give the bridge runtime a worker pool (`new_multi_thread().worker_threads(4)`, + the `rt-multi-thread` tokio feature) so the forwarder keeps running while a connect blocks on the prompt. This also fixes the same starvation for MCP-driven flows. B-22b — modal didn't capture input: `handle_key` only routed to the mini-dialog via the command-palette path, so an async-raised modal (notify() Modal arm sets `mini_dialog` but no palette mode) was unanswerable in Normal/Insert/AI mode; and the GUI's AI-input-lock branch stole Esc/Ctrl-C before `handle_key` ran (Esc hit AI-cancel, not the dialog). Fix: `handle_key` now routes to the dialog whenever `mini_dialog.is_some()` (all modes), and the GUI keyboard dispatch checks the modal before the AI-input-lock/shell branches. Together: the host-key TOFU modal now paints immediately and answers to y/Enter (accept+pin) / n/Esc (reject) regardless of focus. GUI compiles, mae key_handling tests green, clippy --features gui -D warnings clean. Deferred (B-22c, follow-up): the trust notification exposes no bus actions, so it's answerable only by the modal keypress — add Accept/Reject actions so notify_resolve + the *Notifications* row can answer it (headless/agent parity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…; B-22c deferred Notes bob: rebuild (editor-side fix), then re-run 9d via runtime :set — the TOFU modal now paints immediately and captures input even with the *AI* buffer focused. Accept-by-keypress (y) now reachable. B-22c (MCP/bus accept action) deferred as a small optional follow-up for headless parity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er) STILL BROKEN On build 5337cb5: runtime prompt honored (B-21), correct fp surfaced, reject aborts w/o pin. B-22b confirmed fixed — modal now captures input even with *AI* buffer focused. B-22a STILL broken — bob-user: "frozen again for a second, modal capturing input but not rendering." Multi-thread runtime SHORTENED the freeze (was ~120s verifier-timeout starvation, now ~1s) but the modal still never PAINTS; user was blind, pressed keys without seeing. Consequence: accept->pin only "worked" via blind keypress (net: bob connected+pinned to the correct OOB key 07aW…7Ls), but can't be cleanly validated — a user cannot SEE the fingerprint before trusting, defeating TOFU. 9d accept path NOT validated as a usable flow. Proven: B-21, correct-fp, reject-no-pin, B-22b focus. Open: B-22a render (dialog paint not triggered when prompt raised off the handshake thread; runtime fix shortened freeze but didn't wire the MiniDialog redraw). Recommend: finish render fix and/or land B-22c (bus accept action) so accept is verifiable+answerable without depending on GUI paint. bob restored: connected, pinned, policy=accept-new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…indow (paint gated on resolution) bob-user: UI unfreezes once the (unrendered) modal is gone. So the non-render is scoped to the interval the host-key prompt is outstanding; GUI recovers fully on resolution. Diagnosis: the GUI render loop isn't pumping a repaint while the synchronous rustls verifier blocks waiting for the answer — multi-thread runtime shortened the block but the paint path is still gated on prompt resolution, so the MiniDialog overlay never gets a frame during the window it needs to be visible (input is serviced enough to capture the key, but no full redraw runs). Fix: get a redraw to run WHILE the prompt is pending (raise prompt + request paint on GUI thread; let the verifier await async rather than blocking a thread the paint pass depends on). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…g is repaint-scheduling Answer to bob-user's question (principle #7): the host-key TOFU modal reuses the verified MiniDialog — collab_bridge.rs:823-841 raises a blocking action_required notification + mark_full_redraw; it becomes MiniDialogContext::Notification -> MiniDialogKind::Confirm ("Action Required"), answered in apply_mini_dialog. Not ad-hoc. So the render bug isn't "didn't reuse MiniDialog" nor a missing dirty flag: the wiring looks correct on paper — user_event(CollabEvent) sets self.dirty=true (main.rs:2051), about_to_wait (2475) gates renderer.request_redraw() on self.dirty (2688-2705), and handle_collab_event calls mark_full_redraw. Yet live it paints only on keypress + recovers when the modal is gone. Candidate roots for alice (GUI owner) to instrument: (1) HostKeyPrompt CollabEvent not delivered to user_event until a later input event (residual forwarder/proxy-wakeup starvation while the rustls verifier blocks; best fits the symptom); (2) about_to_wait WaitUntil wakeup never fires while blocked; (3) overlay draw skipped for the Notification confirm context. Disambiguate with one log line on user_event(HostKeyPrompt) entry: appears only post-keypress => #1; immediate but no frame => #2/#3. Orthogonal unblock: land B-22c (bus Accept/Reject actions) so 9d accept is verifiable via notify_resolve regardless of GUI paint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…prompt
The trust notification was answerable only by the modal y/n keypress; `notify_resolve`
could merely dismiss (which does NOT send the reply — so an MCP "reject" actually hung
until the 120s verifier timeout). Add a `NotifCommand::Reply(bool)` that genuinely
answers a BlockingReply: it sends on the parked reply channel, tears down the modal if
it's this notification, and resolves. The host-key prompt now carries explicit
"Accept & pin" (action 0) / "Reject" (action 1) bus actions.
Effect: the prompt is answerable over MCP (`notify_resolve {id, action:0|1}`) and via
the *Notifications* row — headless/agent parity, and a working answer path independent
of the GUI modal paint (B-22a, still open). Both routes send on the same channel; first
answer wins (pending_notif_reply is taken once).
Regression: `reply_action_answers_blocking_notification_over_bus` — Accept action sends
true, closes the modal, resolves, decrements outstanding. mae-core 2275 green; clippy
--features gui -D warnings clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…one; B-22a tracked Daemon log confirms 9d: fast reject (no pin) + accept→pin auth/join via bob's captured keypresses (B-22b). B-22c (bus Accept/Reject actions, 7fe4f93) lets the prompt be answered over MCP/notify_resolve regardless of GUI paint. B-22a (modal doesn't paint while verifier blocks) remains as a tracked GUI-paint polish bug with bob's disambiguating log experiment as the next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r→delivery→paint Temporary diagnostic (target "b22a") to pinpoint why the GUI TOFU modal doesn't paint while the verifier blocks. Six timestamped checkpoints: - 1/2 bridge_task: HostKeyPrompt taken off collab_rx + proxy.send_event returned Ok - 3 handle_collab_event: prompt RECEIVED on the main thread (proxy→user_event) - 4/5 about_to_wait: dirty-with-modal-pending + request_redraw() issued - 6 RedrawRequested: a real frame painted with the modal up Reading the sequence vs when the prompt is raised (and when a keypress arrives) disambiguates bob's candidates: 1/2 but no 3 until a keypress ⇒ winit proxy wakeup not firing while blocked; no 1/2 until keypress ⇒ residual forwarder starvation; 3+5 but no 6 ⇒ redraw requested but not serviced; 6 fires but modal invisible ⇒ render pass skips the overlay. Enable with MAE_LOG=b22a=info (or RUST_LOG). Reverted once the root is fixed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…umentation c7a4bc4) Bob: rebuild, launch with MAE_LOG=b22a=info 2>logfile, run set-prompt→clear-pin→connect, then WAIT 10s without touching kbd/mouse (the bridge sends IdleTick every 100ms via the same proxy, so it should paint on its own if the wakeup works), then press n + paste the b22a lines. The 1/2→3→4/5→6 checkpoint sequence pinpoints delivery-wakeup vs redraw-scheduling vs render-pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…AY skipped (render-path bug) Ran alice's b22a instrumented experiment, no kbd/mouse touched. First clean connect cycle: all 6 checkpoints fire within ~24ms of connect with NO keypress (input_dirty=false) and repeat every ~150ms: 1/2 forwarder->proxy, 3 received +6.7ms, 4/5 request_redraw, 6 PAINTING a frame with modal pending +23ms. So proxy DOES wake winit and a frame IS painted unprompted — delivery/wakeup (my earlier hypothesis #1) REFUTED; request_redraw fine too. => alice's last matrix case: paint runs but modal invisible => the render pass paints the frame but SKIPS the MiniDialog overlay for the Notification-confirm context. "Freeze" was perceptual (frames paint every ~150ms but the modal isn't in them). Likely root: render-side twin of B-22b — the GUI draws the mini-dialog overlay only in command-palette/command mode, not whenever mini_dialog.is_some(); the async Notification-confirm modal sets mini_dialog but not palette mode -> overlay skipped. Fix: draw overlay whenever mini_dialog.is_some() (any mode). Handed to alice w/ exact log + fix dir. Also: B-22c confirmed (trust notifs carry Accept&pin/Reject actions). 9d still functional PASS. Process note: agent set-option!->immediate connect races the apply-drain (verify get_option first). bob restored: connected, re-pinned 07aW…7Ls, accept-new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…both backends) Bob's b22a instrumentation was decisive: all six checkpoints fire within ~24ms of the prompt with NO keypress (delivery + redraw + paint all healthy), yet the modal was invisible — the render pass paints frames but SKIPS the mini-dialog overlay. Root cause: the overlay was drawn only inside the `command_palette.is_some()` branch (via `render_command_palette`, which draws `mini_dialog` internally), so an async-raised modal that set `mini_dialog` without `command_palette` (the host-key TOFU prompt) never drew. This is the render-side twin of the B-22b input bug. Fix: both render chains now check `mini_dialog.is_some()` FIRST (top-priority modal), matching the input dispatch (B-22b). A sweep found the TUI renderer (crates/renderer/src/lib.rs) had the identical bug — fixed here too, not just the GUI. Follow-up (next commit): unify the overlay PRIORITY into a single `Editor::active_overlay()` so the GUI + TUI render chains can't diverge again (the root architectural cause — the priority order was duplicated per backend). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t diverge (B-22a root cause) The B-22a sweep confirmed the architectural root cause: the fullscreen-overlay PRIORITY ORDER was hard-coded as an independent if/else chain in EACH backend, and they drifted — the GUI drew the blocking mini-dialog at top priority while the TUI only drew it nested under the command palette, so an async-raised modal (host-key TOFU prompt) painted no dialog in the TUI. Same class as the B-22b input bug. Fix: a single source of truth — `render_common::overlay::active_overlay(&Editor) -> ActiveOverlay` — defines the canonical priority (MiniDialog > FilePicker > FileBrowser > CommandPalette > WhichKey > Splash > None), unit-tested. Both the GUI (crates/gui/src/lib.rs) and TUI (crates/renderer/src/lib.rs) render chains now DERIVE their dispatch from it (`overlay == ActiveOverlay::X`) instead of duplicating the checks, so they stay in lock-step and a future overlay/reorder changes one place. A blocking modal is always highest priority, matching the input dispatch. Behavior-preserving (the per-branch render bodies are unchanged; GUI splash was a `pub use` of the same render_common::splash::should_show_splash). mae-core overlay priority test + clippy --features gui -D warnings clean on mae-core/gui/renderer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l overlay unification Bob's b22a experiment decisive: frame paints but overlay skipped. Fixed GUI+TUI to draw mini_dialog top-priority; sweep found the TUI had the identical bug. Architectural fix 65c2281: single render_common::overlay::active_overlay() priority source consumed by both render chains so they can't diverge. Bob: rebuild → modal should paint; then I rip out the instrumentation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…'t size to contents, fp truncated) Rebuilt on f526aef (unified active_overlay resolver). Re-ran prompt (get_option-verified before connect -> no apply-race -> single prompt). B-22a CONFIRMED FIXED: modal painted on its own (b22a 1-6 ~3.5ms, input_dirty=false) and is VISIBLE now; bob-user accepted the key; accept->pin ->connect end to end (connected, joined, known_hosts re-pinned correct key Ck5Um…=07aW…7Ls, OOB-verified). B-22 trilogy functionally complete. NEW B-23 (security-relevant UX): the modal doesn't adapt to content size — fingerprint text cut off, user couldn't see the ENTIRE key. TOFU requires reading the full fingerprint OOB before trusting; truncation undermines that (pinned key was correct by independent check this run, but UX can't guarantee it). Fix dir: size the dialog box to content (grow to fit within screen) and/or wrap the fingerprint full-width; wrap not clip. Likely in MiniDialog/overlay render geometry (render_common::overlay / backend dialog draw). 9d PASS (accept->pin now with a VISIBLE modal; reject + B-21 + correct-fp previously proven). Remaining polish: B-23 sizing. bob restored: connected, pinned, accept-new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iniDialog sizing, shared backends Code-located the truncation: both backends hard-code the dialog box and don't measure title/body. GUI popup_render.rs:928-929 and TUI popup_render.rs:628-629 use the SAME duplicated formula (width=50.min(cols-4), height=4+fields.len()); the Confirm/Notification body (the ~70-char SHA256 fingerprint) isn't measured in width or height -> clipped. render_common::overlay unified PRIORITY (active_overlay) but not GEOMETRY. Recommended (principle #8, geometry twin of the priority unification): add render_common::overlay::mini_dialog_layout(dialog, max_cols, max_rows) -> DialogLayout that computes width/height from wrapped title+body+fields+actions clamped to screen, WRAPS long content (fingerprint) instead of clipping; both GUI+TUI render_mini_dialog consume it (drop the local 50/4+fields constants); unit-test the layout. Covers ALL MiniDialog kinds so nothing truncates again. Security: host-key TOFU requires the FULL fingerprint be visible before accept (OOB compare); adaptive sizing guarantees it. 9d still PASS (accept->pin with visible modal); B-23 is the readability/sizing fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l fingerprint readable Bob's B-23: the host-key TOFU prompt's modal renders (B-22a) but its box is hard-coded width=50 / height=4+fields.len() and content-blind, so the ~55-char SHA256 host-key fingerprint overflowed and was CLIPPED. confirm() jams the whole multi-line question into a single field label, drawn as one truncated `label: value` row. Both backends duplicated the same formula — the geometry twin of the overlay-priority duplication. Security-relevant: the full fingerprint MUST be readable for the out-of-band compare. Fix (the geometry twin of active_overlay): a single shared render_common::dialog::mini_dialog_layout(dialog, max_cols, max_rows) -> DialogLayout that measures title/body/fields, grows the box to fit, WRAPS long content (word-wrap + hard-break for space-less tokens like a fingerprint), and clamps to the screen. Both GUI and TUI render_mini_dialog now consume it (drop the local 50/4+fields constants), so they can't diverge and EVERY dialog kind sizes to its content. Tests: fingerprint fully visible (was clipped) + box grows past 50; narrow-screen hard-wrap keeps every char; wrap_hard token-break; input dialogs keep field rows + hint. clippy --features gui -D warnings clean on mae-core/gui/renderer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
) + rebuild-to-confirm Geometry twin of active_overlay: render_common::dialog::mini_dialog_layout consumed by both backends; full host-key fingerprint now visible + wrapped. Bob: rebuild → confirm the entire SHA256 shows (no clip) + wraps on a narrow window. Next: convert b22a instrumentation to clean collab-target debug tracing + drop the per-frame render probes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l arc CLOSED Rebuilt a66449f (shared render_common::dialog::mini_dialog_layout). Both paths through the fully-rendered modal: B-23 — bob-user saw the FULL dialog contents (entire SHA256 fingerprint readable, no clipping) -> OOB compare trustworthy. Reject (n) -> ApplicationVerificationFailure, aborted, no pin. Accept (y) -> collab connected + KB join complete; known_hosts re-pinned correct key Ck5Um…=07aW…7Ls. 9d/TOFU/R4 = FULL PASS via a modal that renders (B-22a) + captures input (B-22b) + sizes to content (B-23) + is bus-answerable (B-22c); reject-no-pin and accept-pin both proven with the full fingerprint visible. Security arc validated live B-19->B-23 (epoch fence, continuation fence, runtime policy, modal render/focus/bus/sizing). ADR-024 bus + ADR-018/023 membership-gated write access validated end to end on two machines. Step-9 complete. Remaining: instrumentation cleanup (b22a -> permanent collab tracing + drop render probes) + the collab/config-UX polish theme. bob: connected, pinned, accept-new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t-key lifecycle tracing The b22a diagnostic (c7a4bc4) did its job — it proved the modal painted but the overlay was skipped (fixed in b09becd/65c22813). Per plan, convert the lasting-value parts to permanent tracing and drop the throwaway scaffolding: - REMOVED the per-frame render probes (about_to_wait + RedrawRequested) — hot-path, and the question they answered is settled + now guarded by the active_overlay/dialog unit tests. - REMOVED the bridge_task forward probes + the one-off "b22a" target. - KEPT, as clean `debug!(target: "collab")`, the host-key TOFU lifecycle: prompt raised (handle_collab_event) and the trust decision in the verifier (pinned / rejected / timed out). So `MAE_LOG=collab=debug` tells the whole trust-handshake story without any render-loop spam. clippy --features gui -D warnings clean; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… bugs (Wave 1) Closes testing gaps and non-UX correctness issues surfaced by the live two-machine CRDT validation, ahead of the UX pass. Each item ships with a RED-before/GREEN-after guard (CLAUDE.md #9). A1 daemon: fence no-cascade oracle — assert the canonical node stays byte-identical across a fenced push (not just the error string). A2a editor: notify_ops resolution unit test — R2 fence notification → 3 actions; Keep-mine records pending_reauthor + enqueues KbAdoptNode; Accept-remote adopts without reauthor. B1 fix I-3 split-window click coords: both GUI fallback and TUI passed ABSOLUTE screen cells to the window-local click handler, so clicks in a non-primary split landed at the wrong column. Translate once in the shared handle_mouse_click_inner via the focused pane's layout origin (#8 — one source of truth, fixes both backends) + pure window_relative() helper. B2 config-key invariant guard (every snake_case option exposes its kebab alias; every collab_* has alias + config_key) + extract is_epoch_fence_rejection() so the editor↔daemon "rebase required" contract is centralized and tested; clearer user-facing fence wording (#7). B3 verify joined-KB instance surfaces — federated get/search attribute the node to its instance + it appears in *KB Instances* (regression guard). B4 B-5 malformed-row robustness: a short-arity stored row makes the whole load query fail at bind time before the row-skip loop; degrade load_all to an empty Ok (logged at ERROR) instead of Err that aborted kb_join and tripped the main-thread stall watchdog (#1). Off-thread KB I/O deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y (Wave 1: C2, C3)
C2 — verify + regression-test that connect-critical config is read live from the
single OptionRegistry source, with no read-site cache (the apply-drain race the
live test hit needs no manual get-option wait):
- server_address: read live at connect dispatch (set_option writes it
synchronously); test in dispatch/collab.rs.
- resolve_client_transport reads auth_mode/psk/tls live; test in collab_bridge.
The transport is still built once at task setup and cached — the security-
critical runtime field (host-key policy) is already kept live via
host_key_policy_live; a full per-connect transport rebuild on a runtime
auth_mode/tls change is a documented, deferred follow-up.
C3 — embed the git SHA (build.rs → MAE_BUILD_SHA, "-dirty"/"unknown" fallbacks,
cross-platform per #13) in editor + daemon. Reported in the startup log,
--version, and the daemon $/debug response; collab-doctor now prints the daemon
build and warns on an editor↔daemon mismatch — the "are both machines on the
same commit?" check the live two-machine test ran by hand. Smoke + mismatch
tests on both sides.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gence (Wave 2: A2b, A5) A2b — drive handle_collab_event's KbNodeAdopted (the kb/node_fetch reply) through both fence-resolution paths: keep-mine re-authors the captured edit over the authoritative state and consumes pending_reauthor; accept-remote takes the authoritative value and discards local. Closes the bridge half of the R1 adopt-and-re-author round-trip the manual Step-9 run exercised by hand. A5 — real-daemon convergence: two peers concurrently edit DISJOINT fields of the same KB node from the same base; the daemon merges both into its authoritative per-node doc and two fresh joiners read back BYTE-IDENTICAL state carrying both edits — the CRDT guarantee (#11) end-to-end over TCP + base64, not just an in-process KnowledgeBase merge. MAE_TCP_E2E-gated (CI e2e job; the no-auth daemon skips the epoch fence, so the joiner write is accepted). Plus a manual T1–T7 cross-reference doc-comment mapping each in-process kb_sync_n_peer_e2e test to its live two-machine step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ave 2: C1)
The daemon broadcasts a KB's collection doc (kbc:) on every membership/role
change, but the editor ignored it ("remote update for unknown buffer") and only
relearned its authorization epoch on a full re-join — forcing the manual
reconnect the live two-machine test kept performing by hand.
C1 keeps a local CRDT replica of each joined KB's collection doc
(CollabState::kb_collection_state), seeded from the join snapshot. A live kbc:
RemoteUpdate is now intercepted before the buffer lookup, applied to the replica
(#11), and epoch_of(local_fingerprint) re-derived: kb_epochs updates in place so
the next node edit authors under the rotated, current-epoch client_id — no
reconnect. The user is notified and a `kb-epoch-changed` hook fires
(runtime-redefinable, #7). Replica + epoch are dropped on KbLeft.
Security (#10): the daemon stays the sole authority — it re-derives each member's
epoch from its OWN authoritative collection when fencing, so the relearn is pure
client convenience. A tampered/stale replica can only mislead this client about
its own epoch, never self-elevate; a client that ignores the relearn and authors
under a stale epoch is still fenced. Tests cover the live relearn, that another
member's change cannot bump this peer's epoch, and the unjoined-KB no-op. The
daemon viewer_era_* / stale_epoch_continuation_* fence tests stay GREEN — the
no-weakening gate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…B-6) B-6 (primary KB data dir is XDG-first, not dirs::data_dir()/~Library) was fixed in cf673b7; verify confirms the primary.cozo store path derives from editor.mae_data_dir() (XDG_DATA_HOME → ~/.local/share/mae) with the same XDG-first fallback, and the only residual dirs::data_dir() uses are deliberate read-only module *search* paths. Add a regression test (#13) asserting mae_data_dir() honors XDG_DATA_HOME and falls back to ~/.local/share/mae — never the macOS platform-native dir — so a future change can't silently reintroduce dirs::data_dir and re-split the KB store from the ADR-019 registry markers (restart-survival). A cross-location ~/Library→XDG migration for pre-fix macOS dev builds is intentionally NOT added (highest-risk, marginal early-alpha benefit; the fix already landed without orphaning concerns for XDG-isolated installs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…orized-peer e2e (Wave 3: A4) Fills the security-negative gaps left by the existing mTLS unit tests (mtls_unauthorized_client_rejected / mtls_client_rejects_untrusted_host): - FileHostKeyVerifier TOFU integrity: a previously-pinned daemon host key that CHANGES (MITM / key substitution) is rejected AND the trusted pin is NOT overwritten, so an attacker can't silently re-pin; the genuine key still verifies afterward. Plus a strict-policy-rejects-unknown-host test. Runnable unit tests in shared/mcp/src/identity.rs. - collab-mtls-e2e.sh: added an unauthorized-peer negative scenario — a second editor whose identity is NOT in the daemon's authorized_keys attempts to connect; the daemon's authenticated-peer count must not increase (robust to the exact rustls rejection string). The e2e counterpart to the unit-level unauthorized-client rejection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 3: A3) Rather than ship an unverifiable ~150-line two-editor scheme fence-resolve e2e (its deterministic trigger is now in tension with C1's honest-path epoch relearn, and there is no validated scheme recipe for editing a *shared* KB node to force a fenced kb/node_update — so it can't be authored correct-by-construction without a runnable two-machine environment), document the closure precisely: - Tier-0 "Automated coverage map": each manually-run flow (Step 8 fence safety, Step 9 resolution UX, rebase-required contract, epoch relearn, two-peer convergence, unauthorized peer, MITM no-overwrite, TOFU prompt) → the exact unit/e2e test that now guards it. - Step 8 / Step 9 NOTE callouts: the fence *safety* and resolution *logic* are now unit-automated (A1/A2a/A2b) and the manual reconnect-to-relearn is automatic in-product (C1); the live two-machine run remains for badge/pixel + cross-editor convergence, with the offline edit as the deterministic fence trigger. This serves the "clear success criteria + coverage" goal: the residual fence end-to-end is explicitly the Tier-2 manual run, with every constituent piece unit-covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Brings MAE's collaborative editing from "text buffers over a trusted LAN" to
trusted-peer, membership-gated, crash-safe, write-authorized replicated knowledge
bases — with a first-class attention/notification surface for the resolution UX,
validated across two real machines. The substance is four arcs:
1. Trusted-peer collaboration security core (ADR-017)
the client cert ∈ authorized keys, the editor TOFU-pins the daemon (
known_hosts).shared/mcp/src/tls.rs(rustls/ring),ClientTransport::{Plain,KeyJson,KeyTls}.kb/join+kb/node_updategated on creator-or-member;owner-only
kb/add_member/remove_member/approve. Strict identity binding — anauthenticated peer's label/
saved_byis its verified identity, not self-claimed.collab-mtls-e2e.sh+collab-membership-e2e.sh(both in CI).2. Crash-safe convergent KB sync (ADR-020 → ADR-022)
Replicated KB nodes as per-node yrs CRDTs through the daemon hub. Live two-machine
testing drove this from broken to green and surfaced a chain of bugs no test caught
because the tests used stand-in values / hand-rolled serialization the production path
never produced:
kb/node_updateemitted without anid→ daemon dropped it as a notification.Fixed by a single shared wire builder (
mae_sync::wire) used by editor + daemon + tests.· B-14/B-15 divergent same-id lineages / ignored field edits · B-16 hardcoded
client_id=1collision · B-17derive_kb_client_idreturned a full u64 but yrsClientIDis 53-bit · B-18 node tags (a yrsYArray) did not CRDT-sync (onlytitle/body did) — added
KbNodeDoc::set_tags+ wired through emit.(re)joindoes a bidirectional state-vectorreconcile (
KnowledgeBase::reconcile_remote_node) instead of a blind full-snapshot adopt.A durable-but-unsynced edit is re-derived from the durable
crdt_docon reconnect —independent of the pending-queue row surviving a crash. Never replaces an existing node.
3. Write-authorization: epoch-fenced rebase (ADR-023, B-19 + B-20) — security
Reasoning through the live T7 role test surfaced B-19: the daemon gated writes on the
member's current role but merged opaque, client-authored CRDT updates with no
per-op attribution. So a viewer's locally-applied-but-denied edits stayed local-ahead and
would silently cascade to everyone once they were later granted editor — deferred
privilege escalation. MAE is open-source ⇒ the client is assumed hostile, so client-side
revert is theatre; enforcement is daemon-side.
⇒ unforgeable), bumped when an existing member's role changes; the KB
client_idisepoch-rotated (
derive_kb_client_id(fp, epoch)); the daemon decodes each update andrejects any op authored under a stale-epoch client_id (
rebase required). A continuously-authorized editor's epoch is stable ⇒ full CRDT merge + offline preserved (no T4/T5 regression).
yrs::Update::state_vector(), which omits a contiguous-clock continuation of a clientalready in the canonical base. A member demoted→re-promoted (whose editor kept authoring
under a still-canonical client) could append a post-demotion edit that slipped the fence —
a real bypass of the B-19 guarantee on the demote→re-promote path. Fixed by attributing ops
via apply-and-diff against the authoritative node state (catches continuations), unioned
with the legacy signal so divergent lineages stay caught. Daemon + unit regressions, both red
pre-fix; validated live (9c): the stale continuation now fences, no cascade.
grant-stamp — only a causal-hash DAG defeats that, deferred) and over re-stamping (LWW) /
hosted-edit (no offline). Adversarial exploit-path review in
docs/adr/023-*.md.4. Attention/notification bus + the resolution UX (ADR-024) — and its hardening
The B-19 fence needs a user-facing resolution path (a fenced editor must learn their edit
was rejected and adopt/re-author), and the only surfaces were a clobberable status line and a
buried
*Messages*log. ADR-024 adds a real attention bus + the host-key TOFU modal itgeneralizes:
crates/core/src/notifications.rs): severity→surface routing(OptionRegistry-backed, Scheme-accessible), dedup-by-key, a non-clobberable mode-line
attention badge, and a magit-style
*Notifications*buffer.kb/node_fetchRPC + async adopt-and-re-author(R1, fixes the "fenced editor is stuck" gap); a fenced edit raises an ActionRequired
notification with Accept-remote / Keep-mine / Stash actions (R2); MCP
notifications_listnotify_resolve {id, action}for headless/agent parity (R3); no silent overwrite ofdivergent local work on (re)join (R5).
generic
BlockingReplymodal — answerable by keypress or bus action.:set collab-host-key-policywasn't honored (the verifier was built onceat task setup and cached). Now reads a live policy cell, honored on the next connect.
starved by the synchronous host-key wait → multi-thread pool; and the render pass
skipped the overlay), didn't capture input (routed only in command-palette mode + an
AI-input-lock stole
Esc), and wasn't answerable over the bus (addedNotifCommand::ReplyAccept/Reject actions).
must be fully readable for the out-of-band trust compare). Fixed with content-adaptive
sizing + wrapping.
dialog-geometry logic duplicated per backend (GUI vs TUI) that had drifted. Both are now
single shared computations in
render_common—overlay::active_overlay()anddialog::mini_dialog_layout()— each unit-tested, so that whole "the two backends diverge"class of bug is structurally closed.
required = truemanifest flag) so cross-cuttingmodules like
notifications(whose buffers can be raised by background events) auto-enableregardless of the
(mae!)block — Doom'score/analog.Live validation (two machines, MCP-driven) — GREEN
kill -9stress/concurrent-edit/WAL-recovery/roleenforcement) — all PASS, both directions corroborated.
viewer_era_edits_do_not_cascade_on_grante2e (red without the fence).the B-20 continuation now fences (no cascade, canonical unchanged throughout — proven from the
daemon WAL); resolution coverage complete (Keep-mine + Accept-remote).
the full fingerprint visible — through a modal that renders (B-22a), captures input (B-22b),
sizes to content (B-23), and is bus-answerable (B-22c). FULL PASS.
Test rigor
crates/core/tests/kb_sync_n_peer_e2e.rs, N∈{2,3,5}) driving thereal CRDT path with production-derived client_ids — caught B-17 on its first run.
kb_node_tags_round_trip(B-18) +kb_node_update_survives_daemon_restart(T6) production-protocol e2e. New unit suites for theoverlay-priority resolver, the dialog-layout (fingerprint-not-clipped + narrow-screen wrap),
the host-key live-policy cell, and the bus reply action. Methodology in
docs/collab-kb-sync-testing-lessons.md.init.scm) found during the live run.ADRs / docs
ADR-017 (trusted-peer auth), ADR-020 (replicated KB CRDT), ADR-021 (membership/policy compliance
direction), ADR-022 (crash-safe convergent sync), ADR-023 (secure write-access — epoch-fenced
rebase), ADR-024 (notification/attention bus). Two-machine procedures + the full live log in
docs/collab-testing-plan.md(Step 8 = B-19, Step 9 = ADR-024/B-20→B-23) anddocs/collab-test-notes-bob.md.Still to land before merge / tracked follow-ups (non-security)
crdt_docflush-on-write (durability hardening) · daemon SQLite WAL power-loss durability.set_status/*Messages*callersonto the bus.
(pre-rotation attack); monotonic epoch across remove/re-add; ADR-021 append-only audit log.
Update — testing-gap closure + non-UX fixes + event-driven triggers (pre-UX pass)
Closes the automation gaps and non-UX issues surfaced by the live two-machine run, before the
planned KB-sharing UX review. Every item ships with a RED-before/GREEN-after guard (CLAUDE.md #9).
Automated the manual tests (Arc A): daemon fence no-cascade oracle (canonical node stays
byte-identical across a fenced push); editor notify-resolution unit test (3 actions; Keep-mine
records
pending_reauthor, Accept-remote doesn't);collab_bridgeKbNodeAdoptedround-trip(keep-mine re-authors over authoritative / accept-remote discards); real-daemon two-peer concurrent
convergence (byte-identical merge over TCP); MITM changed-host-key rejection without overwriting
the pin + unauthorized-peer scenario in
collab-mtls-e2e.sh.Non-UX fixes (Arc B): split-window mouse-click coordinates fixed in the shared
handle_mouse_click_inner(both GUI fallback and TUI passed absolute screen coords) via a purewindow_relativelayout-origin translation;CozoKbStore::load_alldegrades a query-bind failureto
Ok(empty)instead of anErrthat abortedkb_joinand tripped the 10s main-thread stallwatchdog (B-5). B-2/B-3/B-6 verified already-correct + locked with regression tests (config-key
kebab-alias invariant, joined-instance surfacing, primary-KB-store XDG-first contract).
Event-driven triggers (Arc C):
kbc:membership broadcast (previously ignored as an "unknown buffer"), so a promote/demote takes
effect with no manual reconnect. A local CRDT replica of the collection doc
(
CollabState.kb_collection_state) is applied as a delta andepoch_of(fingerprint)re-derived.The daemon remains the sole authority (re-derives each member's epoch from its own collection when
fencing), so a tampered replica can only mislead the client about its own epoch — never
self-elevate. No-weakening gate: the daemon
viewer_era_*/stale_epoch_continuation_*fencetests stay GREEN.
the git build SHA (
build.rs→MAE_BUILD_SHA) in the editor + daemon startup log,--version,and
$/debug, andcollab-doctorwarns on an editor↔daemon build mismatch.A3 (live two-editor fence-resolve e2e) — documented, not fabricated: C1 removes the
deterministic online fence trigger (honest clients now relearn and aren't fenced), and there is no
validated scheme recipe for editing a shared KB node to force a fenced update — so rather than ship
an unverifiable e2e,
docs/collab-testing-plan.mdgains an automated-coverage map (each manual flow →its guarding test) and flags the residual full-sequence run as Tier-2 manual (deterministic trigger =
the offline edit). Its constituent pieces are all unit-covered.
Gates:
cargo fmt+clippy -D warningsclean (both workspaces); mae-core 2292, mae-kb 212,mae-mcp 127, mae bins 283, daemon 152, n-peer e2e 12 — all green.
🤖 Generated with Claude Code