Trusted-peer, membership-gated, crash-safe, write-authorized KB sync + attention bus (ADR-017/020/022/023/024) by cuttlefisch · Pull Request #69 · cuttlefisch/mae

cuttlefisch · 2026-06-22T18:47:39Z

Ready for review. The full T1–T7 matrix, Step 8 (B-19 epoch fence), and
Step 9 (ADR-024 notification bus + the B-20→B-23 modal/security arc) are GREEN
across two real machines, MCP-driven. The client is assumed hostile; all write
enforcement is daemon-side. Remaining before merge is housekeeping only (crdt_doc
flush-on-write + tracked non-security follow-ups), listed at the bottom.

Brings MAE's collaborative editing from "text buffers over a trusted LAN" to
trusted-peer, membership-gated, crash-safe, write-authorized replicated knowledge
bases — with a first-class attention/notification surface for the resolution UX,
validated across two real machines. The substance is four arcs:

1. Trusted-peer collaboration security core (ADR-017)

mTLS-as-identity: each peer is an Ed25519 self-signed cert; the daemon checks
the client cert ∈ authorized keys, the editor TOFU-pins the daemon (known_hosts).
shared/mcp/src/tls.rs (rustls/ring), ClientTransport::{Plain,KeyJson,KeyTls}.
Per-KB membership (ADR-018): kb/join + kb/node_update gated on creator-or-member;
owner-only kb/add_member/remove_member/approve. Strict identity binding — an
authenticated peer's label/saved_by is its verified identity, not self-claimed.
Interactive TOFU first-connect UX; PSK + key modes. Validated by
collab-mtls-e2e.sh + collab-membership-e2e.sh (both in CI).

2. Crash-safe convergent KB sync (ADR-020 → ADR-022)

Replicated KB nodes as per-node yrs CRDTs through the daemon hub. Live two-machine
testing drove this from broken to green and surfaced a chain of bugs no test caught
because the tests used stand-in values / hand-rolled serialization the production path
never produced:

B-8 kb/node_update emitted without an id → daemon dropped it as a notification.
Fixed by a single shared wire builder (mae_sync::wire) used by editor + daemon + tests.
B-12 owner re-share clobbered daemon-side membership · B-13 joiner never live-subscribed
· B-14/B-15 divergent same-id lineages / ignored field edits · B-16 hardcoded
client_id=1 collision · B-17 derive_kb_client_id returned a full u64 but yrs
ClientID is 53-bit · B-18 node tags (a yrs YArray) did not CRDT-sync (only
title/body did) — added KbNodeDoc::set_tags + wired through emit.
ADR-022 — the crash-safety mechanism: (re)join does a bidirectional state-vector
reconcile (KnowledgeBase::reconcile_remote_node) instead of a blind full-snapshot adopt.
A durable-but-unsynced edit is re-derived from the durable crdt_doc on reconnect —
independent of the pending-queue row surviving a crash. Never replaces an existing node.

3. Write-authorization: epoch-fenced rebase (ADR-023, B-19 + B-20) — security

Reasoning through the live T7 role test surfaced B-19: the daemon gated writes on the
member's current role but merged opaque, client-authored CRDT updates with no
per-op attribution. So a viewer's locally-applied-but-denied edits stayed local-ahead and
would silently cascade to everyone once they were later granted editor — deferred
privilege escalation. MAE is open-source ⇒ the client is assumed hostile, so client-side
revert is theatre; enforcement is daemon-side.

Mechanism: a per-member authorization epoch on the collection doc (daemon-authored
⇒ unforgeable), bumped when an existing member's role changes; the KB client_id is
epoch-rotated (derive_kb_client_id(fp, epoch)); the daemon decodes each update and
rejects any op authored under a stale-epoch client_id (rebase required). A continuously-
authorized editor's epoch is stable ⇒ full CRDT merge + offline preserved (no T4/T5 regression).
B-20 (found live in Step 9c, fixed): the fence attributed new ops via
yrs::Update::state_vector(), which omits a contiguous-clock continuation of a client
already in the canonical base. A member demoted→re-promoted (whose editor kept authoring
under a still-canonical client) could append a post-demotion edit that slipped the fence —
a real bypass of the B-19 guarantee on the demote→re-promote path. Fixed by attributing ops
via apply-and-diff against the authoritative node state (catches continuations), unioned
with the legacy signal so divergent lineages stay caught. Daemon + unit regressions, both red
pre-fix; validated live (9c): the stale continuation now fences, no cascade.
Server-authoritative, chosen over capability-signed ops (a malicious client backdates the
grant-stamp — only a causal-hash DAG defeats that, deferred) and over re-stamping (LWW) /
hosted-edit (no offline). Adversarial exploit-path review in docs/adr/023-*.md.

4. Attention/notification bus + the resolution UX (ADR-024) — and its hardening

The B-19 fence needs a user-facing resolution path (a fenced editor must learn their edit
was rejected and adopt/re-author), and the only surfaces were a clobberable status line and a
buried *Messages* log. ADR-024 adds a real attention bus + the host-key TOFU modal it
generalizes:

NotificationCenter (crates/core/src/notifications.rs): severity→surface routing
(OptionRegistry-backed, Scheme-accessible), dedup-by-key, a non-clobberable mode-line
attention badge, and a magit-style *Notifications* buffer.
Collab resolution round-trip: kb/node_fetch RPC + async adopt-and-re-author
(R1, fixes the "fenced editor is stuck" gap); a fenced edit raises an ActionRequired
notification with Accept-remote / Keep-mine / Stash actions (R2); MCP notifications_list
- notify_resolve {id, action} for headless/agent parity (R3); no silent overwrite of
  divergent local work on (re)join (R5).
Generalized blocking modal (R4): the bespoke host-key prompt becomes one consumer of a
generic BlockingReply modal — answerable by keypress or bus action.
Live-hardening found by driving the TOFU modal on two machines (Step 9d):
- B-21 — runtime :set collab-host-key-policy wasn't honored (the verifier was built once
  at task setup and cached). Now reads a live policy cell, honored on the next connect.
- B-22a/b/c — the GUI TOFU modal didn't render (a single-threaded bridge runtime was
  starved by the synchronous host-key wait → multi-thread pool; and the render pass
  skipped the overlay), didn't capture input (routed only in command-palette mode + an
  AI-input-lock stole Esc), and wasn't answerable over the bus (added NotifCommand::Reply
  Accept/Reject actions).
- B-23 — the modal didn't size to content, clipping the host-key fingerprint (which
  must be fully readable for the out-of-band trust compare). Fixed with content-adaptive
  sizing + wrapping.
Architectural through-line: B-22a and B-23 were the same shape — overlay-priority and
dialog-geometry logic duplicated per backend (GUI vs TUI) that had drifted. Both are now
single shared computations in render_common — overlay::active_overlay() and
dialog::mini_dialog_layout() — each unit-tested, so that whole "the two backends diverge"
class of bug is structurally closed.
Also: a required/core module tier (required = true manifest flag) so cross-cutting
modules like notifications (whose buffers can be raised by background events) auto-enable
regardless of the (mae!) block — Doom's core/ analog.

Live validation (two machines, MCP-driven) — GREEN

T1–T7 (membership/restart/offline-merge/kill -9 stress/concurrent-edit/WAL-recovery/role
enforcement) — all PASS, both directions corroborated.
Step 8 (B-19 epoch fence): viewer-era edits denied → promote → re-push fenced; daemon
viewer_era_edits_do_not_cascade_on_grant e2e (red without the fence).
Step 9 (ADR-024 + B-20): 9a/9b fence-notification + Keep-mine converge; 9c
the B-20 continuation now fences (no cascade, canonical unchanged throughout — proven from the
daemon WAL); resolution coverage complete (Keep-mine + Accept-remote).
Step 9d (TOFU/R4): reject = fast abort + no pin, accept = auth + join + pin, with
the full fingerprint visible — through a modal that renders (B-22a), captures input (B-22b),
sizes to content (B-23), and is bus-answerable (B-22c). FULL PASS.

Test rigor

N-peer editor-logic harness (crates/core/tests/kb_sync_n_peer_e2e.rs, N∈{2,3,5}) driving the
real CRDT path with production-derived client_ids — caught B-17 on its first run.
Real-daemon SV-reconcile + role + B-19/B-20 e2e; kb_node_tags_round_trip (B-18) +
kb_node_update_survives_daemon_restart (T6) production-protocol e2e. New unit suites for the
overlay-priority resolver, the dialog-layout (fingerprint-not-clipped + narrow-screen wrap),
the host-key live-policy cell, and the bus reply action. Methodology in
docs/collab-kb-sync-testing-lessons.md.
Fixed a config-precedence bug (env/CLI now override init.scm) found during the live run.

ADRs / docs

ADR-017 (trusted-peer auth), ADR-020 (replicated KB CRDT), ADR-021 (membership/policy compliance
direction), ADR-022 (crash-safe convergent sync), ADR-023 (secure write-access — epoch-fenced
rebase), ADR-024 (notification/attention bus). Two-machine procedures + the full live log in
docs/collab-testing-plan.md (Step 8 = B-19, Step 9 = ADR-024/B-20→B-23) and
docs/collab-test-notes-bob.md.

Still to land before merge / tracked follow-ups (non-security)

crdt_doc flush-on-write (durability hardening) · daemon SQLite WAL power-loss durability.
Broadcaster→consumer sweep: migrate the remaining clobberable set_status/*Messages* callers
onto the bus.
Deferred security hardening (documented, not blocking): unpredictable daemon-issued epoch token
(pre-rotation attack); monotonic epoch across remove/re-add; ADR-021 append-only audit log.

Update — testing-gap closure + non-UX fixes + event-driven triggers (pre-UX pass)

Closes the automation gaps and non-UX issues surfaced by the live two-machine run, before the
planned KB-sharing UX review. Every item ships with a RED-before/GREEN-after guard (CLAUDE.md #9).

Automated the manual tests (Arc A): daemon fence no-cascade oracle (canonical node stays
byte-identical across a fenced push); editor notify-resolution unit test (3 actions; Keep-mine
records pending_reauthor, Accept-remote doesn't); collab_bridge KbNodeAdopted round-trip
(keep-mine re-authors over authoritative / accept-remote discards); real-daemon two-peer concurrent
convergence (byte-identical merge over TCP); MITM changed-host-key rejection without overwriting
the pin + unauthorized-peer scenario in collab-mtls-e2e.sh.

Non-UX fixes (Arc B): split-window mouse-click coordinates fixed in the shared
handle_mouse_click_inner (both GUI fallback and TUI passed absolute screen coords) via a pure
window_relative layout-origin translation; CozoKbStore::load_all degrades a query-bind failure
to Ok(empty) instead of an Err that aborted kb_join and tripped the 10s main-thread stall
watchdog (B-5). B-2/B-3/B-6 verified already-correct + locked with regression tests (config-key
kebab-alias invariant, joined-instance surfacing, primary-KB-store XDG-first contract).

Event-driven triggers (Arc C):

C1 (security-gated): the editor now relearns its KB authorization epoch from a live kbc:
membership broadcast (previously ignored as an "unknown buffer"), so a promote/demote takes
effect with no manual reconnect. A local CRDT replica of the collection doc
(CollabState.kb_collection_state) is applied as a delta and epoch_of(fingerprint) re-derived.
The daemon remains the sole authority (re-derives each member's epoch from its own collection when
fencing), so a tampered replica can only mislead the client about its own epoch — never
self-elevate. No-weakening gate: the daemon viewer_era_* / stale_epoch_continuation_* fence
tests stay GREEN.
C2 connect-critical config (server address, auth mode/PSK) verified read-live; C3 embeds
the git build SHA (build.rs → MAE_BUILD_SHA) in the editor + daemon startup log, --version,
and $/debug, and collab-doctor warns on an editor↔daemon build mismatch.

A3 (live two-editor fence-resolve e2e) — documented, not fabricated: C1 removes the
deterministic online fence trigger (honest clients now relearn and aren't fenced), and there is no
validated scheme recipe for editing a shared KB node to force a fenced update — so rather than ship
an unverifiable e2e, docs/collab-testing-plan.md gains an automated-coverage map (each manual flow →
its guarding test) and flags the residual full-sequence run as Tier-2 manual (deterministic trigger =
the offline edit). Its constituent pieces are all unit-covered.

Gates: cargo fmt + clippy -D warnings clean (both workspaces); mae-core 2292, mae-kb 212,
mae-mcp 127, mae bins 283, daemon 152, n-peer e2e 12 — all green.

🤖 Generated with Claude Code

The 0.13.11 and 0.13.12 version bumps updated Cargo.toml but not the workspace member versions in Cargo.lock (earlier bumps had explicit 'sync Cargo.lock' chores; these two missed it). A plain cargo build regenerates these, dirtying the tree — sync them once so both dev machines start from a clean working tree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add shared/mcp/src/keystore.rs: a permission-guarded trusted_keys file (default $XDG_DATA_HOME/mae/collab/trusted_keys, 0600) holding symmetric PSKs out of config.toml. Format: '[name] <secret>' per line, # comments. Both editor and daemon read it via mae-mcp so path + format live in one place. Extend PskAuth to be multi-key on the server side: it can trust a SET of named keys (a keystore) and select the one a client advertises via a new optional key_id in the auth hello. Backward compatible — unnamed clients use the server's default (first) key; serde ignores the absent/extra field so old and new peers interoperate. Proof verification now uses constant-time Mac::verify_slice instead of string compare. Foundation only; daemon + editor wiring follow. mae-mcp: 100 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Daemon: in psk mode, build the trusted set from the keystore (every entry is a peer credential) plus legacy psk/psk_command (one unnamed key), and construct a single shared multi-key PskAuth. Add 'mae-daemon keygen [name]' (random 0600 key, printed for copying to peers) and 'mae-daemon keys' (names + fingerprints, never secrets). check-config/doctor now report the keystore path + key count and warn on loose perms. Editor: resolve the client credential via resolve_client_credential() — precedence psk_command > psk > keystore primary key — and advertise the key's name as the wire key_id so the daemon selects it. Pure resolver is unit-tested; the keystore lookup no longer makes the empty-psk test flaky. Verified end-to-end: a client with only a keystore key connects to a psk daemon — 'PSK auth succeeded key=client-cli'. Closes the gap where a PSK could only come from config.toml (which is being retired). mae-mcp 100, collab_bridge 84, daemon config tests pass; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Design for an asymmetric collab auth mode ('key') alongside none/psk: Ed25519 keypairs, known_hosts (client pins daemon) + authorized_keys (daemon trusts clients), mutual signed-challenge handshake, client TOFU policy (prompt/accept-new/strict), daemon pending-approval + admin CLI (identity/authorized/pending/authorize/revoke). Enables trust-on-first-use and per-peer revocation without shared-secret rotation. Symmetric keystore (this branch) remains as 'psk' mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ADR-017 phase 1) shared/mcp/src/identity.rs: Ed25519 Identity (load_or_generate, 0600 private key), PublicKey (base64 wire form, SSH-style 'mae-ed25519 <b64> <label>' lines, SHA256: fingerprints), KnownHosts (client pins daemon keys), AuthorizedKeys (daemon trusts client keys, add/authorize/revoke), and a HostKeyVerifier abstraction with a known_hosts-backed FileHostKeyVerifier implementing the accept-new / strict / prompt TOFU policies (pins on first use, aborts on a changed host key). shared/mcp/src/auth.rs: KeyAuth AuthProvider — a mutual signed-challenge handshake binding both pubkeys + nonces into a domain-separated transcript. Server verifies the client signature and checks authorized_keys; client verifies the server signature and applies the host-key policy before proving its own key. Adds ed25519-dalek + base64 deps. Crypto core only; daemon + editor wiring + TOFU UI follow. mae-mcp: 113 tests pass (13 new for identity/keyauth), clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add CollabAuth enum (None/Psk/Key); in 'key' mode the daemon loads/generates its Ed25519 identity and an authorized_keys trust store, and runs KeyAuth::server per connection. check_collab accepts 'key' and flags an empty authorized_keys. New admin CLI: mae-daemon identity show the daemon pubkey line + fingerprint mae-daemon authorized list trusted client keys (label + fingerprint) mae-daemon authorize <pubkey> add a client pubkey line to authorized_keys mae-daemon revoke <label> remove client key(s) by label check-config/doctor report the identity fingerprint + authorized key count. Verified: identity → authorize → check-config OK. clippy -D warnings clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…phase 0) Add shared/mcp/src/tls.rs: mutual TLS where each peer presents a self-signed X.509 cert whose SPKI is its existing Ed25519 Identity key. TLS 1.3 gives confidentiality + proof-of-possession; peer trust moves into custom verifiers: the daemon checks the client cert's pubkey against AuthorizedKeys, the editor TOFU-pins the daemon cert's pubkey via HostKeyVerifier. This unifies encryption + mutual auth + pinning on the identities we already manage, superseding the JSON KeyAuth handshake on the TLS path. - ring crypto backend with an explicit CryptoProvider (avoids clashing with the editor's reqwest aws-lc-rs default). Daemon gains rustls for the first time and builds cleanly (ring only, no aws-lc-rs/cmake conflict with cozo). - ed25519_pubkey_from_cert (x509-parser, OID 1.3.101.112) is the trust-critical extraction — round-trip tested against our own cert. - PeerIdentity {label,fingerprint,pubkey} added to identity.rs (authoritative identity for strict binding); Identity::pkcs8_der; Debug on HostKeyVerifier. mae-mcp: 119 tests (6 new incl. full in-process mTLS handshake: authorized client succeeds + identity recovered, unauthorized rejected, untrusted host rejected). clippy clean; both workspaces build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…phase 1) Daemon: add CollabAuth::KeyTls (built for mode=key + tls=true, default). The accept loop wraps the whole TcpStream with the rustls TlsAcceptor (not pre-split), recovers the verified PeerIdentity via peer_identity_from_tls, then splits the TlsStream and runs the session. Plaintext psk/legacy-key/none paths unchanged. AuthConfig gains tls: bool (default true); check-config shows it. Session plumbing: ClientSession gains peer_identity + with_identity() + authenticated_label(); collab_handler refactored so handle_client (anon) and the new handle_client_authenticated(peer) share run_session(). handle_client_with_auth (psk/legacy-key) now synthesizes a PeerIdentity from the auth label and routes through it — the authenticated label finally reaches the session instead of being dropped. mae-mcp re-exports tokio_rustls::{TlsAcceptor,TlsConnector}. Regression-safe: 36 daemon collab_e2e tests pass; clippy -D warnings clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ase 2a) Add three Scheme-configurable options (OptionRegistry + get/set + validation): - collab_auth_mode (none|psk|key) — selects the handshake; key = Ed25519 trusted-peer identity over mTLS. - collab_host_key_policy (prompt|accept-new|strict) — TOFU policy for an unknown daemon identity. - collab_tls (default true) — mTLS vs plaintext JSON KeyAuth fallback. CollabState gains the fields (defaults psk/prompt/true). config.toml wiring intentionally omitted (config.toml is retiring; set via init.scm / :set). Add 'mae --collab-identity': prints this editor's Ed25519 peer identity (generating it on first use) + the exact 'mae-daemon authorize' line, so an admin can authorize the peer. Label = hostname. Transport wiring to actually use key mode follows in 2b. mae-core option tests +2; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…hase 2b) Refactor the editor connection so it can speak mTLS. run_collab_task is now generic over the stream: a ClientTransport{Plain,KeyJson,KeyTls} enum (resolved once from collab_auth_mode/tls/host_key_policy) drives a single establish_connection() helper; read/write halves are type-erased (Box<dyn AsyncBufRead/AsyncWrite>) so TCP and TLS share one loop. spawn_reader_task is generic; the three connect sites (Connect/StartServer/reconnect) route through establish_connection, skipping the PSK handshake on the TLS path. In key mode the editor loads its Ed25519 identity + a known_hosts FileHostKeyVerifier (TOFU policy) and connects via tls::client_config; KeyJson is the tls=false fallback. mae-mcp re-exports ServerName. E2E: scripts/collab-mtls-e2e.sh (make test-collab-mtls-e2e) spins up a real key+tls daemon, authorizes the editor identity, and runs a real editor over mTLS — connect, share a buffer, daemon confirms the share. Verified 7/7 green; daemon authenticates the peer 'framework' by cert (strict binding visible). collab_bridge 84 tests pass; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The authenticated peer identity (from mTLS, or the JSON-handshake label) is now authoritative for attribution. Thread auth_label through run_session into the doc handlers (handle_doc_*_inner; thin #[cfg(test)] wrappers keep the 28 existing handler tests untouched) and enforce: - kb/share: a key/TLS-authenticated peer that claims a creator other than its verified identity is REJECTED ('creator mismatch'); the authenticated label is the authoritative creator. Anonymous (psk/none) sessions keep self-claimed values (backward compatible). - sync/awareness: broadcast user_name (cursor label) overridden with the authenticated label — cursor labels can't be spoofed. - docs/save_committed: saved_by overridden with the authenticated label. Closes the spoofable-creator gap. 3 new unit tests (spoofed rejected, matching allowed, anonymous preserved); daemon 76 lib + 36 e2e tests pass; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Least-privilege access to shared KBs among trusted peers. An authenticated (key/TLS) peer may join/update a KB only if it is the creator or in the KB's KbCollectionDoc.members(); anonymous (psk/none) sessions keep connection-level trust (backward compatible). - kb_membership_check gates kb/join and kb/node_update. - New owner-only methods kb/add_member / kb/remove_member {kb_id, member}: verify the caller is the collection creator, apply add/remove via the collection CRDT, persist + broadcast the update. - Residual limitation (documented): a member could still smuggle membership edits through a raw kbc: sync/update; server-side CRDT field ACLs are future work. The sanctioned path is the owner-only methods. 4 well-designed unit tests: creator joins / non-member denied; owner add→join →update, remove→denied; only-owner-manages; anonymous-not-gated. daemon 80 lib + 36 e2e tests pass; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wire :kb-member-add / :kb-member-remove <kb-id> <member> end to end: CollabIntent::KbAddMember/KbRemoveMember (dispatch_collab parses args from the ex-command line) → CollabCommand::KbMember → run_collab_task sends kb/add_member /kb/remove_member RPC (PendingResponseKind::KbMember) → response becomes a status line, or a CollabEvent::Error on denial (e.g. 'only the owner can manage members'). Disconnected handler reports not-connected. 3 dispatch unit tests (args→intent, both add/remove, missing-args→no-intent); editor 90 collab + core collab tests pass; clippy clean. The daemon enforcement this drives is covered by the 4 membership unit tests in phase 4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…or membership e2e Bug caught by the two-editor e2e: strict binding overrode the kb/share creator VARIABLE but not the collection doc's internal creator()/members() (set by the client from its user_name). So the owner-check in kb/add_member failed (coll.creator() != authenticated label), the add was silently rejected, and a newly-added member was still denied. Fix: KbCollectionDoc::set_creator re-stamps the creator + seeds it as a member; the daemon calls it on kb/share for authenticated sessions, binding the shared collection to the verified peer identity. scripts/collab-membership-e2e.sh (make test-collab-membership-e2e): two real editors over mTLS — alice shares, bob denied (not a member), alice adds bob, bob joins. Oracle = daemon log. VERIFIED PASS. + set_creator unit test. mae-sync 144, daemon 80 tests; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When collab_host_key_policy=prompt and the editor meets an unknown daemon identity, a PromptingHostKeyVerifier emits CollabEvent::HostKeyPrompt and BLOCKS the connection task on a std reply channel; the main (UI) thread shows a 'Trust Daemon Key? <fingerprint> [y/N]' MiniDialog (MiniDialogContext::PeerKeyAccept), and the y/n answer is routed back (Editor.pending_host_key_reply) to pin (accept) or abort (reject). The collab task runs on a separate thread from the winit/TUI loop, so the block is safe; a 120s timeout rejects if unanswered. A previously pinned key that matches is accepted silently; a CHANGED key aborts (MITM). accept-new/strict keep the non-interactive file verifier (headless default). 4 unit tests cover the channel round-trip + pinning. mae 88 collab + mae-core 15 dialog tests pass; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fix confirmed inaccuracies that misled users: - daemon CLI: removed nonexistent --unix-socket/--db/--wal-threshold; documented the real flags (--bind/--config/--data-dir/--check-config) + the keygen/keys/ identity/authorized/authorize/revoke subcommands. - env var MAE_COLLAB_ADDR → MAE_COLLAB_SERVER. - editor options: point at init.scm (config.toml retiring), correct defaults (collab-server-address 127.0.0.1:9473, collab-user-name not -username, backoff 2), add collab-auth-mode/host-key-policy/tls. - WAL recovery path collab.db → collab/state.db. Add §10 Trusted-Peer Mode: Ed25519 mTLS setup end to end (daemon identity → authorize peer → editor key mode + TOFU → per-KB membership commands), and update §8 Security for the three auth modes (none/psk/key) + mTLS shipped. ADR-017 → Accepted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The e2e job now also pulls the mae-daemon release artifact (needs: [check, daemon]) and runs scripts/collab-mtls-e2e.sh + scripts/collab-membership-e2e.sh against the real release binaries — exercising the full trusted-peer stack (Ed25519 mTLS handshake, TOFU, strict identity binding, per-KB membership) headlessly. Adds iproute2 (the scripts use ss for port readiness). Verified both pass with release binaries locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…live) Consolidated step-by-step plan: Tier 0 automated (unit + e2e + CI commands), Tier 1 single-host CLI smoke, Tier 2 the two-machine live run (daemon+editor on D, editor on E) covering identity exchange/authorize, TOFU connect, buffer convergence + authenticated cursor labels, KB membership (deny→add→allow→remove), and security/negative checks (unauthorized peer, changed host key, tcpdump confidentiality). Results checklist + troubleshooting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Prepend a Setup section: Rust >=1.95 (MSRV), iproute2 for the e2e scripts, optional GUI build deps; build both workspaces (make build-tui + build-daemon); get binaries on PATH (install targets or copy). Plus a key-setup table clarifying that the automated e2e scripts generate+authorize their own keys, while the manual tiers need you to exchange identities + mae-daemon authorize (Tier 2 Step 3), and where identities live + how to reset them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

One-command, idempotent key-mode setup: `mae setup-collab [--server <addr>]` generates the peer identity (if absent), persists collab-auth-mode=key + server + auto-connect to init.scm (via the existing save_option_to_init), and prints the exact `mae-daemon authorize` line. Re-running updates in place (no duplicates). SSH integration (opt-in key reuse for an SSH-like purpose): - Editor: `mae setup-collab --ssh-key ~/.ssh/id_ed25519` imports an unencrypted OpenSSH Ed25519 PRIVATE key as the collab identity (Identity::import_ssh_private_key via the ssh-key crate; from_seed/save helpers). - Daemon: `mae-daemon authorize --from-ssh-pub <file> <label>` imports the SSH PUBLIC key (PublicKey::from_ssh_line — manual SSH wire parse, no dep). - Verified consistent end-to-end: the editor's imported MAE fingerprint EQUALS the daemon's authorized fingerprint, so the editor presents exactly the key the daemon trusts. Errors clearly on encrypted/non-ed25519 keys. Note: reusing one key across SSH + MAE couples their compromise; a dedicated MAE identity (default) keeps them separate — documented in COLLABORATION.md §10. 3 new mae-mcp tests (ssh pubkey roundtrip, ssh private import matches pubkey, + existing). mae-mcp 121, daemon 80; clippy clean. Docs + testing plan updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…against 0.0.0.0 The two-machine testing plan used the default collab port 9473, which collides with an already-running personal daemon (binds 127.0.0.1:9473; a test daemon on 0.0.0.0:9473 overlaps loopback). Switch Tier 2 to a non-default port (9480) with an explicit "check it's free first" note, and document bind-vs-connect (0.0.0.0 is a bind address, never a connect target). - scripts: collab-mtls-e2e.sh / collab-membership-e2e.sh now auto-select the first free port (scan upward from 9476/9477 via `ss`) unless MAE_E2E_PORT is set explicitly — so a running daemon or a concurrent test run never triggers "address already in use". Loopback-bound, so they never touched 9473 anyway; this just makes them robust against any busy port. - mae setup-collab: reject `--server 0.0.0.0:…` with a clear message — that's the daemon's bind address, not a reachable connect target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The collab e2e harness was Linux-only and silently mis-ran on macOS, forcing stop-and-go debugging across our two dev machines. Three issues, one root theme — platform-divergent path/tool resolution: 1. Daemon dir resolution ignored XDG on macOS. `daemon/src/config.rs` resolved config + data dirs via bare `dirs::config_dir()` / `dirs::data_dir()`, which follow Apple conventions on macOS (`~/Library/Application Support`) and ignore `XDG_CONFIG_HOME` / `XDG_DATA_HOME`. The e2e scripts isolate each peer via those env vars, so on macOS the daemon never found its generated `daemon.toml`, fell back to all defaults (default bind :9473, default `$TMPDIR/mae-daemon.sock`), and collided with the developer's personal daemon — "daemon failed to listen". Meanwhile the *identity*/*keystore* code (mae-mcp) already resolves XDG-first, so identities landed in the isolated dir while config/data landed in the real Library dir (split brain). Fix: resolve config + data dirs XDG-first on all platforms (env when set, else `dirs::*`), matching `mae-mcp::identity` / `keystore`. Pure extension — macOS users without XDG set are unchanged. 2. Port-readiness probe used `ss` (Linux iproute2), absent on macOS, so the daemon-listening check always failed even when it was up. Add a portable `port_listening` helper: prefer `ss` (Linux/CI unchanged), then `lsof`, then `netstat`. 3. The editor run was wrapped in `timeout`, absent on stock macOS. Use a `${TIMEOUT_BIN:+...}` prefix resolving `timeout` → `gtimeout` → omitted (bash 3.2-safe, `set -u`-safe). Codify the lesson as CLAUDE.md principle #13 (cross-platform parity): XDG-first dirs everywhere, portable shell tooling, CI on both OSes — a fix that only works on one machine is not a fix. Verified on macOS (was failing, now passing): - scripts/collab-mtls-e2e.sh ............ 7/7, mTLS peer authenticated - scripts/collab-membership-e2e.sh ...... 7/7 + 7/7, deny→add→allow - cargo test -p mae-mcp ................. 121 passed - cd daemon && cargo test / clippy ...... 9 passed / clean - cargo test -p mae --bins collab ....... 94 passed Linux behavior unchanged (ss/timeout still preferred; XDG already worked). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a shared scratchpad to the testing plan so D (driver) and E (mac) can start the Tier 2 live run the moment D is up — no round-trips. Captures the concrete session state instead of the reference topology: - E (bob, mac, 192.168.1.132) is READY: built from a8ac842, personal daemon stopped (9473 clear), identity generated — fingerprint + pre-formatted `mae-daemon authorize ... bob` line for D to paste. - D's row is a fill-in (IP, fingerprint, status) the driver commits back. - Test port 9480 (avoids the personal-daemon :9473 collision). - mDNS returned nothing on this LAN → connect by explicit host:port. - D's unblock checklist (pull a8ac842, bind 0.0.0.0:9480 key-mode, authorize bob, publish fingerprint, open firewall). Each machine edits its own row, commits, pushes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…th machines (#66) The live two-machine run surfaced issue #66: the interactive TOFU `prompt` policy (what `setup-collab` writes by default) is unwired and freezes the editor — hard-freezes the TUI, silently fails the GUI. It bites EVERY editor, including D's own "alice" (which connects to D's daemon too), so it's a coordination hazard, not just a local quirk. Update the testing plan so the other machine doesn't trip on it: - Prominent #66 callout: every editor must set collab_host_key_policy = "accept-new" in init.scm (non-blocking, auto-pins) until #66 is fixed; verify the daemon fingerprint OUT-OF-BAND against the pinned known_hosts entry instead of via the (broken) prompt. - Board: min build bumped to b947a52; added an `accept-new set` column; D's checklist now says rebuild BOTH binaries (branch moved past the first harness build) and configure accept-new before launching alice. - Step 4 rewritten for accept-new + the out-of-band pin check; the interactive prompt path is marked deferred to #66. - Results checklist: T0 marked green (macOS), row 4 split into accept-new (now) and prompt-TOFU (deferred). No code change — config/docs only. Tier 0 already validated on macOS at b947a52. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Track machine-E observations during the two-machine ADR-017 validation so we surface + fix issues, and D sees our findings. Logged so far: - resolved: cross-platform Tier 0 fix (a8ac842) - filed: #66 (TUI TOFU prompt deadlock) - open/HIGH: alice rope panic crash (D-side, suspect shared/sync rope bridge) - open: bob local edits to a joined buffer not visible on read-back (2x; cause TBD) - open: connection flapping (peer closed w/o TLS close_notify) — correlated w/ alice crash Convergence so far: alice->bob receive confirmed; round-trip not yet validated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…test step Per the review-the-process feedback: each entry now carries its tier/step (T0, T2.4, T2.5, …), Action → Expected → Actual → Status → Repro, so issues are pinpointed to the code path under stress and are reproducible. - Run 1 chronological table (10 rows) mapping each success/failure to a step. - Issue details I-1 (alice rope panic @ T2.5, task #18), I-2 (bob edit not visible @ T2.5), I-7 (connection flapping @ T2.4/5), #66 (TOFU @ T2.4). - Convergence scorecard by direction+step; next-run-from-scratch checklist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ef bob Aligns naming with collab-test-notes-bob.md. Logs the run-1 progress (cross-machine mTLS auth ✅, alice→bob receive ✅) and the I-1 alice rope panic at T2.5: Rope::char(138) OOB on bob's remote edit of an em-dash line. Scopes it to the editor-side apply-remote path in crates/core (text.rs bridge + local cursor adjust are already clamped). D owns the backtrace + fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ackward (I-1) The two-machine collab run crashed alice's GUI with a ropey panic ("index past end of Rope: char index 138, Rope char length 34"). Backtrace: Rope::char <- word::word_start_backward <- mouse_ops::handle_mouse_click_inner Not a CRDT bug (headless convergence never crashed) — a mouse bug. Clicking the right pane of a vertical split registers as a double-click word-select, and the screen column (~138) far overruns the short line. The double-click path passed an unclamped text_col to char_offset_at (the single-click path already clamps), and word_start_backward guarded pos==0 but not pos>len_chars (word_end_forward already guards), so rope.char(137) on a 34-char rope panicked. Fixes: - word::word_start_backward clamps pos.min(len_chars()) (defense in depth). - mouse_ops double-click path clamps text_col to the clicked line length before char_offset_at (also guards the link-follow branch). Tests: word_motions_clamp_out_of_bounds_pos, word_start_backward_out_of_bounds_on_empty_rope, mouse_double_click_past_line_end_does_not_panic. Full mae-core suite 2237/2237. Follow-up (I-3): the fallback handle_mouse_click uses raw (non-window-relative) coords in a split — now safe (clamped) but cursor lands at line end; make it window-relative later. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…resolved After fix a57455f, clean run from scratch (T2.4→T2.5): - alice→bob and bob→alice convergence both confirmed over mTLS, two machines. - I-1 (rope panic) FIXED + verified live — root cause was double-click word-select in a split pane passing an OOB offset to word_start_backward, NOT the CRDT path (multibyte was a red herring). No crash in Run 2. - I-2 (bob edit "not visible") RESOLVED as a driving artifact: MCP active buffer was *AI:claude*; switch-to-buffer must be its own verified step. - I-7 (flapping) RESOLVED — it was a symptom of alice crashing (I-1), gone now. Next: simultaneous-edit, then T2.6 KB membership, T2.7 security checks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Records the post-fix live run: bidirectional CRDT convergence confirmed (bob's line + alice's seed + alice's typed line all merged; 52 session-7 + 1 session-8 updates), and the I-1 fix verified live (double-click @ col 138 in a split no longer crashes). Reattributes bob's I-2 to an MCP eval_scheme artifact (buffer- insert via eval skips the event-loop post-edit collab flush; real keystrokes sync fine). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… connect path Bob's 9d run surfaced B-21: a runtime `:set collab-host-key-policy prompt` (or `(set-option! …)`) updated the option (get_option reflected "prompt") but the connect still auto-pinned under the init.scm value (accept-new) — so the TOFU modal never appeared. Root cause: `resolve_client_transport` builds the host-key verifier ONCE in `setup_collab_channels` (startup) and caches it in `CollabSpawn.transport`; every `:collab-connect` reuses that cached verifier, so a runtime policy change never reached it. Same class as the auto-connect env gap fixed in 91a5201. Fix (editor-side): the verifier now reads a LIVE policy cell at verify-time. - CollabState gains `host_key_policy_live: Arc<Mutex<String>>`, a cross-thread mirror of `host_key_policy`; set_option keeps it in sync; resolve_client_transport seeds it from the current value at setup. - The editor now ALWAYS uses the prompting verifier (the only one that *can* prompt), made policy-dynamic: it reads the live cell each verify and dispatches accept-new → pin, strict → reject, prompt → ask. So a runtime switch to/from prompt takes effect on the NEXT connect with no relaunch. Regression: `host_key_policy_change_honored_at_verify_time_b21` — one verifier instance pins silently under accept-new, then (live cell flipped to prompt) ASKS on a new host instead of auto-pinning. 4 existing verifier tests updated for the new field. mae-core 2274 + mae collab_bridge 95 green; clippy -D warnings clean. Unblocks 9d: bob can now `:set collab-host-key-policy prompt` at runtime + connect → the R4 TOFU modal (GUI under prompt — the #66 deadlock path) — no init.scm relaunch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…(rebuild + runtime :set) Path B chosen. Notes bob: this fix is editor-side so he MUST rebuild/reinstall/relaunch (unlike B-20). Then run 9d via runtime (set-option! collab_host_key_policy "prompt") — now honored — clear the pin, connect → expect the R4 TOFU modal (GUI under prompt, the #66 deadlock path). n-then-y, OOB fingerprint SHA256:07aW…7Ls. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…22 GUI TOFU modal render/focus bug B-21 closed: runtime set-option collab_host_key_policy=prompt now honored — connect BLOCKED on the prompt (no auto-pin) and raised a bus notification with the correct fingerprint 07aW…7Ls (OOB match). Reject path correct: notify_resolve(dismiss) -> handshake aborted, status off, known_hosts NOT pinned. B-22 (new, GUI): the R4 TOFU modal is invisible AND unresponsive — (1) no repaint on raise (GUI only redraws on keypress, ~2-key lag); (2) no input-focus capture (keys leak to the underlying buffer — Esc triggered Claude commands with the AI buffer focused). GUI sibling of #66; R4 fixed the plumbing but not the GUI render/focus path. Round-2 accept UN-TESTABLE: bus notification exposed actions:[] (only dismiss=reject), no MCP accept lever, and the modal y/Enter can't be delivered through the broken GUI. Fix dirs: BlockingReply raise must request redraw/damage; modal must grab input focus; add explicit bus actions (Accept&pin / Reject) for headless/Notifications parity. bob restored via accept-new -> auto-pin (07aW…7Ls) -> connected + reconcile-joined; temp backups removed. 9d verdict: B-21 + fingerprint + reject ✅; accept-via-UI blocked by B-22. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e + focus) Bob's 9d run on the B-21 build proved the prompt PLUMBING works (correct fingerprint, reject logic, no-pin-on-reject) but the GUI modal surface was broken: it never drew (GUI froze until a keypress, ~2 behind) and keys leaked to the underlying buffer. Two independent defects, both fixed here: B-22a — runtime starvation (no repaint on raise): the GUI bridge ran on a `new_current_thread` tokio runtime hosting BOTH the collab connection task and the `bridge_task` proxy forwarder (+ AI/LSP/DAP/MCP). The host-key verifier is called synchronously by rustls mid-handshake and blocks (up to 120s) on `reply_rx.recv_timeout` waiting for the prompt answer — starving that one worker so the `HostKeyPrompt` event never reached the GUI and `mark_full_redraw` never ran (the GUI twin of the #66 TUI deadlock). Fix: give the bridge runtime a worker pool (`new_multi_thread().worker_threads(4)`, + the `rt-multi-thread` tokio feature) so the forwarder keeps running while a connect blocks on the prompt. This also fixes the same starvation for MCP-driven flows. B-22b — modal didn't capture input: `handle_key` only routed to the mini-dialog via the command-palette path, so an async-raised modal (notify() Modal arm sets `mini_dialog` but no palette mode) was unanswerable in Normal/Insert/AI mode; and the GUI's AI-input-lock branch stole Esc/Ctrl-C before `handle_key` ran (Esc hit AI-cancel, not the dialog). Fix: `handle_key` now routes to the dialog whenever `mini_dialog.is_some()` (all modes), and the GUI keyboard dispatch checks the modal before the AI-input-lock/shell branches. Together: the host-key TOFU modal now paints immediately and answers to y/Enter (accept+pin) / n/Esc (reject) regardless of focus. GUI compiles, mae key_handling tests green, clippy --features gui -D warnings clean. Deferred (B-22c, follow-up): the trust notification exposes no bus actions, so it's answerable only by the modal keypress — add Accept/Reject actions so notify_resolve + the *Notifications* row can answer it (headless/agent parity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…; B-22c deferred Notes bob: rebuild (editor-side fix), then re-run 9d via runtime :set — the TOFU modal now paints immediately and captures input even with the *AI* buffer focused. Accept-by-keypress (y) now reachable. B-22c (MCP/bus accept action) deferred as a small optional follow-up for headless parity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…er) STILL BROKEN On build 5337cb5: runtime prompt honored (B-21), correct fp surfaced, reject aborts w/o pin. B-22b confirmed fixed — modal now captures input even with *AI* buffer focused. B-22a STILL broken — bob-user: "frozen again for a second, modal capturing input but not rendering." Multi-thread runtime SHORTENED the freeze (was ~120s verifier-timeout starvation, now ~1s) but the modal still never PAINTS; user was blind, pressed keys without seeing. Consequence: accept->pin only "worked" via blind keypress (net: bob connected+pinned to the correct OOB key 07aW…7Ls), but can't be cleanly validated — a user cannot SEE the fingerprint before trusting, defeating TOFU. 9d accept path NOT validated as a usable flow. Proven: B-21, correct-fp, reject-no-pin, B-22b focus. Open: B-22a render (dialog paint not triggered when prompt raised off the handshake thread; runtime fix shortened freeze but didn't wire the MiniDialog redraw). Recommend: finish render fix and/or land B-22c (bus accept action) so accept is verifiable+answerable without depending on GUI paint. bob restored: connected, pinned, policy=accept-new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…indow (paint gated on resolution) bob-user: UI unfreezes once the (unrendered) modal is gone. So the non-render is scoped to the interval the host-key prompt is outstanding; GUI recovers fully on resolution. Diagnosis: the GUI render loop isn't pumping a repaint while the synchronous rustls verifier blocks waiting for the answer — multi-thread runtime shortened the block but the paint path is still gated on prompt resolution, so the MiniDialog overlay never gets a frame during the window it needs to be visible (input is serviced enough to capture the key, but no full redraw runs). Fix: get a redraw to run WHILE the prompt is pending (raise prompt + request paint on GUI thread; let the verifier await async rather than blocking a thread the paint pass depends on). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…g is repaint-scheduling Answer to bob-user's question (principle #7): the host-key TOFU modal reuses the verified MiniDialog — collab_bridge.rs:823-841 raises a blocking action_required notification + mark_full_redraw; it becomes MiniDialogContext::Notification -> MiniDialogKind::Confirm ("Action Required"), answered in apply_mini_dialog. Not ad-hoc. So the render bug isn't "didn't reuse MiniDialog" nor a missing dirty flag: the wiring looks correct on paper — user_event(CollabEvent) sets self.dirty=true (main.rs:2051), about_to_wait (2475) gates renderer.request_redraw() on self.dirty (2688-2705), and handle_collab_event calls mark_full_redraw. Yet live it paints only on keypress + recovers when the modal is gone. Candidate roots for alice (GUI owner) to instrument: (1) HostKeyPrompt CollabEvent not delivered to user_event until a later input event (residual forwarder/proxy-wakeup starvation while the rustls verifier blocks; best fits the symptom); (2) about_to_wait WaitUntil wakeup never fires while blocked; (3) overlay draw skipped for the Notification confirm context. Disambiguate with one log line on user_event(HostKeyPrompt) entry: appears only post-keypress => #1; immediate but no frame => #2/#3. Orthogonal unblock: land B-22c (bus Accept/Reject actions) so 9d accept is verifiable via notify_resolve regardless of GUI paint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…prompt The trust notification was answerable only by the modal y/n keypress; `notify_resolve` could merely dismiss (which does NOT send the reply — so an MCP "reject" actually hung until the 120s verifier timeout). Add a `NotifCommand::Reply(bool)` that genuinely answers a BlockingReply: it sends on the parked reply channel, tears down the modal if it's this notification, and resolves. The host-key prompt now carries explicit "Accept & pin" (action 0) / "Reject" (action 1) bus actions. Effect: the prompt is answerable over MCP (`notify_resolve {id, action:0|1}`) and via the *Notifications* row — headless/agent parity, and a working answer path independent of the GUI modal paint (B-22a, still open). Both routes send on the same channel; first answer wins (pending_notif_reply is taken once). Regression: `reply_action_answers_blocking_notification_over_bus` — Accept action sends true, closes the modal, resolves, decrements outstanding. mae-core 2275 green; clippy --features gui -D warnings clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…one; B-22a tracked Daemon log confirms 9d: fast reject (no pin) + accept→pin auth/join via bob's captured keypresses (B-22b). B-22c (bus Accept/Reject actions, 7fe4f93) lets the prompt be answered over MCP/notify_resolve regardless of GUI paint. B-22a (modal doesn't paint while verifier blocks) remains as a tracked GUI-paint polish bug with bob's disambiguating log experiment as the next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…r→delivery→paint Temporary diagnostic (target "b22a") to pinpoint why the GUI TOFU modal doesn't paint while the verifier blocks. Six timestamped checkpoints: - 1/2 bridge_task: HostKeyPrompt taken off collab_rx + proxy.send_event returned Ok - 3 handle_collab_event: prompt RECEIVED on the main thread (proxy→user_event) - 4/5 about_to_wait: dirty-with-modal-pending + request_redraw() issued - 6 RedrawRequested: a real frame painted with the modal up Reading the sequence vs when the prompt is raised (and when a keypress arrives) disambiguates bob's candidates: 1/2 but no 3 until a keypress ⇒ winit proxy wakeup not firing while blocked; no 1/2 until keypress ⇒ residual forwarder starvation; 3+5 but no 6 ⇒ redraw requested but not serviced; 6 fires but modal invisible ⇒ render pass skips the overlay. Enable with MAE_LOG=b22a=info (or RUST_LOG). Reverted once the root is fixed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…umentation c7a4bc4) Bob: rebuild, launch with MAE_LOG=b22a=info 2>logfile, run set-prompt→clear-pin→connect, then WAIT 10s without touching kbd/mouse (the bridge sends IdleTick every 100ms via the same proxy, so it should paint on its own if the wakeup works), then press n + paste the b22a lines. The 1/2→3→4/5→6 checkpoint sequence pinpoints delivery-wakeup vs redraw-scheduling vs render-pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…AY skipped (render-path bug) Ran alice's b22a instrumented experiment, no kbd/mouse touched. First clean connect cycle: all 6 checkpoints fire within ~24ms of connect with NO keypress (input_dirty=false) and repeat every ~150ms: 1/2 forwarder->proxy, 3 received +6.7ms, 4/5 request_redraw, 6 PAINTING a frame with modal pending +23ms. So proxy DOES wake winit and a frame IS painted unprompted — delivery/wakeup (my earlier hypothesis #1) REFUTED; request_redraw fine too. => alice's last matrix case: paint runs but modal invisible => the render pass paints the frame but SKIPS the MiniDialog overlay for the Notification-confirm context. "Freeze" was perceptual (frames paint every ~150ms but the modal isn't in them). Likely root: render-side twin of B-22b — the GUI draws the mini-dialog overlay only in command-palette/command mode, not whenever mini_dialog.is_some(); the async Notification-confirm modal sets mini_dialog but not palette mode -> overlay skipped. Fix: draw overlay whenever mini_dialog.is_some() (any mode). Handed to alice w/ exact log + fix dir. Also: B-22c confirmed (trust notifs carry Accept&pin/Reject actions). 9d still functional PASS. Process note: agent set-option!->immediate connect races the apply-drain (verify get_option first). bob restored: connected, re-pinned 07aW…7Ls, accept-new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…both backends) Bob's b22a instrumentation was decisive: all six checkpoints fire within ~24ms of the prompt with NO keypress (delivery + redraw + paint all healthy), yet the modal was invisible — the render pass paints frames but SKIPS the mini-dialog overlay. Root cause: the overlay was drawn only inside the `command_palette.is_some()` branch (via `render_command_palette`, which draws `mini_dialog` internally), so an async-raised modal that set `mini_dialog` without `command_palette` (the host-key TOFU prompt) never drew. This is the render-side twin of the B-22b input bug. Fix: both render chains now check `mini_dialog.is_some()` FIRST (top-priority modal), matching the input dispatch (B-22b). A sweep found the TUI renderer (crates/renderer/src/lib.rs) had the identical bug — fixed here too, not just the GUI. Follow-up (next commit): unify the overlay PRIORITY into a single `Editor::active_overlay()` so the GUI + TUI render chains can't diverge again (the root architectural cause — the priority order was duplicated per backend). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t diverge (B-22a root cause) The B-22a sweep confirmed the architectural root cause: the fullscreen-overlay PRIORITY ORDER was hard-coded as an independent if/else chain in EACH backend, and they drifted — the GUI drew the blocking mini-dialog at top priority while the TUI only drew it nested under the command palette, so an async-raised modal (host-key TOFU prompt) painted no dialog in the TUI. Same class as the B-22b input bug. Fix: a single source of truth — `render_common::overlay::active_overlay(&Editor) -> ActiveOverlay` — defines the canonical priority (MiniDialog > FilePicker > FileBrowser > CommandPalette > WhichKey > Splash > None), unit-tested. Both the GUI (crates/gui/src/lib.rs) and TUI (crates/renderer/src/lib.rs) render chains now DERIVE their dispatch from it (`overlay == ActiveOverlay::X`) instead of duplicating the checks, so they stay in lock-step and a future overlay/reorder changes one place. A blocking modal is always highest priority, matching the input dispatch. Behavior-preserving (the per-branch render bodies are unchanged; GUI splash was a `pub use` of the same render_common::splash::should_show_splash). mae-core overlay priority test + clippy --features gui -D warnings clean on mae-core/gui/renderer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…l overlay unification Bob's b22a experiment decisive: frame paints but overlay skipped. Fixed GUI+TUI to draw mini_dialog top-priority; sweep found the TUI had the identical bug. Architectural fix 65c2281: single render_common::overlay::active_overlay() priority source consumed by both render chains so they can't diverge. Bob: rebuild → modal should paint; then I rip out the instrumentation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…'t size to contents, fp truncated) Rebuilt on f526aef (unified active_overlay resolver). Re-ran prompt (get_option-verified before connect -> no apply-race -> single prompt). B-22a CONFIRMED FIXED: modal painted on its own (b22a 1-6 ~3.5ms, input_dirty=false) and is VISIBLE now; bob-user accepted the key; accept->pin ->connect end to end (connected, joined, known_hosts re-pinned correct key Ck5Um…=07aW…7Ls, OOB-verified). B-22 trilogy functionally complete. NEW B-23 (security-relevant UX): the modal doesn't adapt to content size — fingerprint text cut off, user couldn't see the ENTIRE key. TOFU requires reading the full fingerprint OOB before trusting; truncation undermines that (pinned key was correct by independent check this run, but UX can't guarantee it). Fix dir: size the dialog box to content (grow to fit within screen) and/or wrap the fingerprint full-width; wrap not clip. Likely in MiniDialog/overlay render geometry (render_common::overlay / backend dialog draw). 9d PASS (accept->pin now with a VISIBLE modal; reject + B-21 + correct-fp previously proven). Remaining polish: B-23 sizing. bob restored: connected, pinned, accept-new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…iniDialog sizing, shared backends Code-located the truncation: both backends hard-code the dialog box and don't measure title/body. GUI popup_render.rs:928-929 and TUI popup_render.rs:628-629 use the SAME duplicated formula (width=50.min(cols-4), height=4+fields.len()); the Confirm/Notification body (the ~70-char SHA256 fingerprint) isn't measured in width or height -> clipped. render_common::overlay unified PRIORITY (active_overlay) but not GEOMETRY. Recommended (principle #8, geometry twin of the priority unification): add render_common::overlay::mini_dialog_layout(dialog, max_cols, max_rows) -> DialogLayout that computes width/height from wrapped title+body+fields+actions clamped to screen, WRAPS long content (fingerprint) instead of clipping; both GUI+TUI render_mini_dialog consume it (drop the local 50/4+fields constants); unit-test the layout. Covers ALL MiniDialog kinds so nothing truncates again. Security: host-key TOFU requires the FULL fingerprint be visible before accept (OOB compare); adaptive sizing guarantees it. 9d still PASS (accept->pin with visible modal); B-23 is the readability/sizing fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…l fingerprint readable Bob's B-23: the host-key TOFU prompt's modal renders (B-22a) but its box is hard-coded width=50 / height=4+fields.len() and content-blind, so the ~55-char SHA256 host-key fingerprint overflowed and was CLIPPED. confirm() jams the whole multi-line question into a single field label, drawn as one truncated `label: value` row. Both backends duplicated the same formula — the geometry twin of the overlay-priority duplication. Security-relevant: the full fingerprint MUST be readable for the out-of-band compare. Fix (the geometry twin of active_overlay): a single shared render_common::dialog::mini_dialog_layout(dialog, max_cols, max_rows) -> DialogLayout that measures title/body/fields, grows the box to fit, WRAPS long content (word-wrap + hard-break for space-less tokens like a fingerprint), and clamps to the screen. Both GUI and TUI render_mini_dialog now consume it (drop the local 50/4+fields constants), so they can't diverge and EVERY dialog kind sizes to its content. Tests: fingerprint fully visible (was clipped) + box grows past 50; narrow-screen hard-wrap keeps every char; wrap_hard token-break; input dialogs keep field rows + hint. clippy --features gui -D warnings clean on mae-core/gui/renderer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

) + rebuild-to-confirm Geometry twin of active_overlay: render_common::dialog::mini_dialog_layout consumed by both backends; full host-key fingerprint now visible + wrapped. Bob: rebuild → confirm the entire SHA256 shows (no clip) + wraps on a narrow window. Next: convert b22a instrumentation to clean collab-target debug tracing + drop the per-frame render probes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…l arc CLOSED Rebuilt a66449f (shared render_common::dialog::mini_dialog_layout). Both paths through the fully-rendered modal: B-23 — bob-user saw the FULL dialog contents (entire SHA256 fingerprint readable, no clipping) -> OOB compare trustworthy. Reject (n) -> ApplicationVerificationFailure, aborted, no pin. Accept (y) -> collab connected + KB join complete; known_hosts re-pinned correct key Ck5Um…=07aW…7Ls. 9d/TOFU/R4 = FULL PASS via a modal that renders (B-22a) + captures input (B-22b) + sizes to content (B-23) + is bus-answerable (B-22c); reject-no-pin and accept-pin both proven with the full fingerprint visible. Security arc validated live B-19->B-23 (epoch fence, continuation fence, runtime policy, modal render/focus/bus/sizing). ADR-024 bus + ADR-018/023 membership-gated write access validated end to end on two machines. Step-9 complete. Remaining: instrumentation cleanup (b22a -> permanent collab tracing + drop render probes) + the collab/config-UX polish theme. bob: connected, pinned, accept-new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t-key lifecycle tracing The b22a diagnostic (c7a4bc4) did its job — it proved the modal painted but the overlay was skipped (fixed in b09becd/65c22813). Per plan, convert the lasting-value parts to permanent tracing and drop the throwaway scaffolding: - REMOVED the per-frame render probes (about_to_wait + RedrawRequested) — hot-path, and the question they answered is settled + now guarded by the active_overlay/dialog unit tests. - REMOVED the bridge_task forward probes + the one-off "b22a" target. - KEPT, as clean `debug!(target: "collab")`, the host-key TOFU lifecycle: prompt raised (handle_collab_event) and the trust decision in the verifier (pinned / rejected / timed out). So `MAE_LOG=collab=debug` tells the whole trust-handshake story without any render-loop spam. clippy --features gui -D warnings clean; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… bugs (Wave 1) Closes testing gaps and non-UX correctness issues surfaced by the live two-machine CRDT validation, ahead of the UX pass. Each item ships with a RED-before/GREEN-after guard (CLAUDE.md #9). A1 daemon: fence no-cascade oracle — assert the canonical node stays byte-identical across a fenced push (not just the error string). A2a editor: notify_ops resolution unit test — R2 fence notification → 3 actions; Keep-mine records pending_reauthor + enqueues KbAdoptNode; Accept-remote adopts without reauthor. B1 fix I-3 split-window click coords: both GUI fallback and TUI passed ABSOLUTE screen cells to the window-local click handler, so clicks in a non-primary split landed at the wrong column. Translate once in the shared handle_mouse_click_inner via the focused pane's layout origin (#8 — one source of truth, fixes both backends) + pure window_relative() helper. B2 config-key invariant guard (every snake_case option exposes its kebab alias; every collab_* has alias + config_key) + extract is_epoch_fence_rejection() so the editor↔daemon "rebase required" contract is centralized and tested; clearer user-facing fence wording (#7). B3 verify joined-KB instance surfaces — federated get/search attribute the node to its instance + it appears in *KB Instances* (regression guard). B4 B-5 malformed-row robustness: a short-arity stored row makes the whole load query fail at bind time before the row-skip loop; degrade load_all to an empty Ok (logged at ERROR) instead of Err that aborted kb_join and tripped the main-thread stall watchdog (#1). Off-thread KB I/O deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y (Wave 1: C2, C3) C2 — verify + regression-test that connect-critical config is read live from the single OptionRegistry source, with no read-site cache (the apply-drain race the live test hit needs no manual get-option wait): - server_address: read live at connect dispatch (set_option writes it synchronously); test in dispatch/collab.rs. - resolve_client_transport reads auth_mode/psk/tls live; test in collab_bridge. The transport is still built once at task setup and cached — the security- critical runtime field (host-key policy) is already kept live via host_key_policy_live; a full per-connect transport rebuild on a runtime auth_mode/tls change is a documented, deferred follow-up. C3 — embed the git SHA (build.rs → MAE_BUILD_SHA, "-dirty"/"unknown" fallbacks, cross-platform per #13) in editor + daemon. Reported in the startup log, --version, and the daemon $/debug response; collab-doctor now prints the daemon build and warns on an editor↔daemon mismatch — the "are both machines on the same commit?" check the live two-machine test ran by hand. Smoke + mismatch tests on both sides. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…gence (Wave 2: A2b, A5) A2b — drive handle_collab_event's KbNodeAdopted (the kb/node_fetch reply) through both fence-resolution paths: keep-mine re-authors the captured edit over the authoritative state and consumes pending_reauthor; accept-remote takes the authoritative value and discards local. Closes the bridge half of the R1 adopt-and-re-author round-trip the manual Step-9 run exercised by hand. A5 — real-daemon convergence: two peers concurrently edit DISJOINT fields of the same KB node from the same base; the daemon merges both into its authoritative per-node doc and two fresh joiners read back BYTE-IDENTICAL state carrying both edits — the CRDT guarantee (#11) end-to-end over TCP + base64, not just an in-process KnowledgeBase merge. MAE_TCP_E2E-gated (CI e2e job; the no-auth daemon skips the epoch fence, so the joiner write is accepted). Plus a manual T1–T7 cross-reference doc-comment mapping each in-process kb_sync_n_peer_e2e test to its live two-machine step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ave 2: C1) The daemon broadcasts a KB's collection doc (kbc:) on every membership/role change, but the editor ignored it ("remote update for unknown buffer") and only relearned its authorization epoch on a full re-join — forcing the manual reconnect the live two-machine test kept performing by hand. C1 keeps a local CRDT replica of each joined KB's collection doc (CollabState::kb_collection_state), seeded from the join snapshot. A live kbc: RemoteUpdate is now intercepted before the buffer lookup, applied to the replica (#11), and epoch_of(local_fingerprint) re-derived: kb_epochs updates in place so the next node edit authors under the rotated, current-epoch client_id — no reconnect. The user is notified and a `kb-epoch-changed` hook fires (runtime-redefinable, #7). Replica + epoch are dropped on KbLeft. Security (#10): the daemon stays the sole authority — it re-derives each member's epoch from its OWN authoritative collection when fencing, so the relearn is pure client convenience. A tampered/stale replica can only mislead this client about its own epoch, never self-elevate; a client that ignores the relearn and authors under a stale epoch is still fenced. Tests cover the live relearn, that another member's change cannot bump this peer's epoch, and the unjoined-KB no-op. The daemon viewer_era_* / stale_epoch_continuation_* fence tests stay GREEN — the no-weakening gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…B-6) B-6 (primary KB data dir is XDG-first, not dirs::data_dir()/~Library) was fixed in cf673b7; verify confirms the primary.cozo store path derives from editor.mae_data_dir() (XDG_DATA_HOME → ~/.local/share/mae) with the same XDG-first fallback, and the only residual dirs::data_dir() uses are deliberate read-only module *search* paths. Add a regression test (#13) asserting mae_data_dir() honors XDG_DATA_HOME and falls back to ~/.local/share/mae — never the macOS platform-native dir — so a future change can't silently reintroduce dirs::data_dir and re-split the KB store from the ADR-019 registry markers (restart-survival). A cross-location ~/Library→XDG migration for pre-fix macOS dev builds is intentionally NOT added (highest-risk, marginal early-alpha benefit; the fix already landed without orphaning concerns for XDG-isolated installs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…orized-peer e2e (Wave 3: A4) Fills the security-negative gaps left by the existing mTLS unit tests (mtls_unauthorized_client_rejected / mtls_client_rejects_untrusted_host): - FileHostKeyVerifier TOFU integrity: a previously-pinned daemon host key that CHANGES (MITM / key substitution) is rejected AND the trusted pin is NOT overwritten, so an attacker can't silently re-pin; the genuine key still verifies afterward. Plus a strict-policy-rejects-unknown-host test. Runnable unit tests in shared/mcp/src/identity.rs. - collab-mtls-e2e.sh: added an unauthorized-peer negative scenario — a second editor whose identity is NOT in the daemon's authorized_keys attempts to connect; the daemon's authenticated-peer count must not increase (robust to the exact rustls rejection string). The e2e counterpart to the unit-level unauthorized-client rejection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… 3: A3) Rather than ship an unverifiable ~150-line two-editor scheme fence-resolve e2e (its deterministic trigger is now in tension with C1's honest-path epoch relearn, and there is no validated scheme recipe for editing a *shared* KB node to force a fenced kb/node_update — so it can't be authored correct-by-construction without a runnable two-machine environment), document the closure precisely: - Tier-0 "Automated coverage map": each manually-run flow (Step 8 fence safety, Step 9 resolution UX, rebase-required contract, epoch relearn, two-peer convergence, unauthorized peer, MITM no-overwrite, TOFU prompt) → the exact unit/e2e test that now guards it. - Step 8 / Step 9 NOTE callouts: the fence *safety* and resolution *logic* are now unit-automated (A1/A2a/A2b) and the manual reconnect-to-relearn is automatic in-product (C1); the live two-machine run remains for badge/pixel + cross-editor convergence, with the offline edit as the deterministic fence trigger. This serves the "clear success criteria + coverage" goal: the residual fence end-to-end is explicitly the Tier-2 manual run, with every constituent piece unit-covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cuttlefisch and others added 30 commits June 15, 2026 18:16

cuttlefisch and others added 22 commits June 23, 2026 18:02

cuttlefisch changed the title ~~Trusted-peer, membership-gated, crash-safe + write-authorized KB sync (ADR-017/020/022/023)~~ Trusted-peer, membership-gated, crash-safe, write-authorized KB sync + attention bus (ADR-017/020/022/023/024) Jun 23, 2026

cuttlefisch and others added 7 commits June 23, 2026 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trusted-peer, membership-gated, crash-safe, write-authorized KB sync + attention bus (ADR-017/020/022/023/024)#69

Trusted-peer, membership-gated, crash-safe, write-authorized KB sync + attention bus (ADR-017/020/022/023/024)#69
cuttlefisch wants to merge 193 commits into
mainfrom
feat/crdt-collab-validation

cuttlefisch commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cuttlefisch commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Trusted-peer collaboration security core (ADR-017)

2. Crash-safe convergent KB sync (ADR-020 → ADR-022)

3. Write-authorization: epoch-fenced rebase (ADR-023, B-19 + B-20) — security

4. Attention/notification bus + the resolution UX (ADR-024) — and its hardening

Live validation (two machines, MCP-driven) — GREEN

Test rigor

ADRs / docs

Still to land before merge / tracked follow-ups (non-security)

Update — testing-gap closure + non-UX fixes + event-driven triggers (pre-UX pass)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cuttlefisch commented Jun 22, 2026 •

edited

Loading