Skip to content

Minors sweep + memory design round: scoring decay, multilingual embedder, retention sweep#12

Merged
AlanY1an merged 13 commits into
mainfrom
fix/minors-and-memory-design
Jun 12, 2026
Merged

Minors sweep + memory design round: scoring decay, multilingual embedder, retention sweep#12
AlanY1an merged 13 commits into
mainfrom
fix/minors-and-memory-design

Conversation

@AlanY1an

Copy link
Copy Markdown
Owner

Closes out every remaining open finding plus the four memory design items. All gates green: 1813 passed (+71 tests over the previous round), ruff clean, import contracts 4/4. Two independent reviewers verified spec compliance per finding (revert spot-checks confirmed) and code quality — both APPROVE.

Minor fixes (25 findings + 3 strays)

  • Web/SSE: overflowed chat clients get a close sentinel so EventSource reconnects; finished import pipelines end their event stream instead of hanging; dead subscriber queues detach.
  • Runtime: run refuses to start over a live daemon and cleans stale pidfiles; [runtime].log_level is consumed; httpx is a core dep (stop/reload/status need it); web startup rollback unregisters the channel; reload docstring/log match reality; config writes share one atomic merge-validate-write path; live turn renders once in the prompt; pending-expectation embeddings are cached and off-loop; _LLMLike protocol annotation matches its call site.
  • Prompts: extraction self-check Check 2 has its question; persona_bootstrap/persona_facts say two core blocks (matching their schemas).
  • Voice: per-turn cache write runs off-loop; FishAudio estimate uses the real $15/1M-byte rate (~67× correction); env var standardized on FISH_AUDIO_KEY end to end.
  • Import: UTF-8 BOM handled at the bytes boundary; CSV batches all carry the header; cost estimate runs the real chunker plus per-chunk prompt overhead (a ceiling, not a floor).
  • Core/samples: LLM price table covers the current model generation with cache rates and a preset→price guard test; config.toml.sample drops the rejected [proactive] keys and now loads through the real config loader in CI; docs/ai routing links the proactive pages.
  • Proactive/iMessage: follow-up timer registry self-cleans (bounded over daemon lifetime); failed iMessage connects tear down the spawned subprocess.

Memory design round

  • Impact decay + salience reinforcement: the rerank impact term decays on a 90-day half-life to a 0.3 floor; the floor rises with log1p(access_count + re-mentions) so repeatedly-confirmed memories hold weight — this is what keeps stable identity facts in the force-loaded user thoughts (same dampened helper, formulas can't diverge). Reviewer-checked: the reinforcement loop is bounded (floor caps at 1.0 ≈ fresh, candidacy stays relevance-gated, pinned thoughts don't bump access_count).
  • Multilingual embedder: default is intfloat/multilingual-e5-small (384-dim, schema unchanged) for the bilingual user. The DB carries an embedding_model tag; on mismatch, startup re-embeds every vec row (nodes + entities) in one transaction, tag written last — crash-safe, idempotent. Untagged DBs with vectors are treated as the prior default, so existing installs upgrade with one automatic re-embed. --no-embedder skips the sync.
  • Retention sweep: sweep_dead_vectors removes vec rows (and FTS entries for soft-deleted nodes) 30 days after delete/supersede, daily from the consolidate worker idle branch — the KNN scan space stays bounded to retrievable rows; node rows and supersede chains survive. delete_concept_node's eager-scrub parameter is gone (sweep owns index cleanup).
  • Docs: en/zh synced section-by-section (scoring formula, sweep, embedder, min_relevance 0.55), docs/ai updated, reviewer confirmed bilingual equivalence and zero changelog narrative.

Notes for the reviewer

  • tests/memory_eval/embedders.py still pins all-MiniLM-L6-v2: switching the eval harness to e5 invalidates the existing baselines in develop-docs/memory/eval-runs/, so that re-baseline is left as a deliberate separate decision.
  • First daemon start after this lands downloads e5-small (~450 MB) and re-embeds existing vectors once; offline machines fail fast at startup before touching the DB (setting [memory].embedder back to the cached model boots immediately, no re-embed).
  • Pre-existing stale [proactive] key mentions in docs/{en,zh}/configuration.md and proactive.md (from the v3 rewrite) were noted by review but predate this branch — follow-up material.

AlanY1an added 13 commits June 12, 2026 16:00
An overflowed client is evicted from the broadcaster with a close
sentinel queued in its backlog; the chat event stream ends on the
sentinel so the browser's EventSource reconnects instead of holding a
dead connection.
subscribe_events on a terminal pipeline replays history and closes
instead of waiting forever; dead subscriber queues detach on iterator
exit.
…tions

The recent-conversation section excludes the current turn's rows (the
'What they just said' section owns them) without shrinking the prior
window. Pending-expectation embeddings are cached per node, pruned to
the live set, and computed off the event loop; the duplicate pending
query is gone. _LLMLike.complete's annotation matches the (text,
usage) tuple it returns.
Check 2 carries its missing question; persona_bootstrap and
persona_facts describe the two-block schema they parse.
The per-turn cache write (tmp+fsync+replace) runs via asyncio.to_thread;
the cost estimate uses fish.audio's $15/1M-byte list price; the API key
env var is FISH_AUDIO_KEY everywhere, matching the shipped templates.
Text decodes via utf-8-sig at the bytes boundary; CSV batches past the
first carry the header within the line budget; the cost estimate runs
the real chunker and adds per-chunk prompt overhead derived from the
live templates.
…outing

Price data covers the current Anthropic generation (Fable 5, Opus 4.8)
with cache read/write rates; presets resolve through the price lookup
under a guard test. config.toml.sample drops the [proactive] keys the
loader rejects and loads through the real config loader in a test; the
[voice] comment states the actual default; env.sample names a real
provider value. docs/ai routing table links the proactive pages.
Each timer's done-callback discards its own registry entry (identity-
checked so a reschedule's replacement survives), keeping _active_timers
bounded over the daemon's lifetime.
A subscribe timeout (or any connect failure) tears down the spawned
imsg subprocess before the error propagates; injected clients stay the
injector's responsibility, and a failed start clears the single-use
client so a retry builds a fresh one.
The rerank impact term decays on a 90-day half-life to a 0.3 floor, and
the floor rises with log1p(access_count + re-mentions) so memories the
user keeps touching hold their weight — stable identity facts persist
in the force-loaded user thoughts, which rank by the same dampened
impact helper.
sweep_dead_vectors removes vec rows (and FTS entries for soft-deleted
nodes) once a node has been deleted or superseded for 30 days, keeping
the KNN scan space bounded to rows retrieval can return; node rows and
supersede chains stay intact. The consolidate worker idle branch runs
it at most once per day. delete_concept_node leaves index cleanup to
the sweep — the eager scrub parameter is gone.
- echovessel run refuses to start when the pidfile names a live PID and
  cleans stale pidfiles; exit only unlinks a pidfile it owns.
- [runtime].log_level is applied at startup (explicit --log-level wins);
  httpx is a core dependency since stop/reload/status use it; the web
  startup rollback unregisters the channel; reload's docstring and log
  state exactly what applied; config writes share one merge-validate-
  write-atomically path.
- The default embedder is intfloat/multilingual-e5-small (384-dim, same
  vec schema). The DB carries an embedding_model tag in memory_meta;
  when the configured model differs, ensure_embedding_model re-embeds
  every vec row (concept nodes and entities, via the same input shape
  resolve_entity stores) in one transaction with the tag written last,
  so a crash mid-sync redoes the work. Untagged DBs with vectors are
  treated as the prior default. --no-embedder skips the sync so a
  placeholder embedder can never overwrite real vectors.
Bilingual (en/zh) updates: rerank formula with impact decay and
reinforcement constants, force-loaded thoughts ranking, the dead-vector
retention sweep, the multilingual embedder default with automatic
re-embed on model change, min_relevance 0.55, FISH_AUDIO_KEY, and a
fresh scorer-tutorial example. docs/ai memory pages carry the same
current-state facts.
@AlanY1an AlanY1an merged commit c230af0 into main Jun 12, 2026
6 checks passed
@AlanY1an AlanY1an deleted the fix/minors-and-memory-design branch June 12, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant