feat: typed models + schema validation at DB read boundaries (closes #24) by timon0305 · Pull Request #30 · cppalliance/cppa-cursor-browser

timon0305 · 2026-05-11T14:42:26Z

Closes #24.

What

Adds a typed-model layer at the four Cursor SQLite read sites. Each from_dict() raises SchemaError when a critical field is missing, so a schema rename surfaces as a printed warning instead of an empty conversation rendered silently.

Why

Per the eval (Section 5.1 — Schema Fragility), Cursor's on-disk JSON shapes are not versioned. Every read site used .get("field") with a fallback, so a Cursor rename of composerData, bubbleId, latestRootBlobId, etc. would return None, the fallback would apply, and the user would see an empty conversation with no error in the log. The eval called this "the tool's primary failure mode, unobservable by any automated means."

How

New models/ package — six dataclasses, each with a from_dict() validator:

Model	Used at
`Composer`	global `cursorDiskKV` composer rows
`WorkspaceLocalComposer`	per-workspace `ItemTable` composer summaries
`Bubble`	`bubbleId:*` rows
`CliSessionMeta`	CLI session `meta` blob
`Workspace`	`workspace.json`
`ExportEntry`	export manifest (`manifest.jsonl`)

SchemaError inherits from ValueError so existing except ValueError call sites still trap it.

Wire-ins (each catches SchemaError, prints Schema drift in ..., skips the row):

Read site	Model
`api/workspaces.py:601`	`Composer`
`api/composers.py:57-77`	`WorkspaceLocalComposer` (per-row) + envelope dict/list guards
`api/search.py:163-164`	`Composer`
`utils/cli_chat_reader.py:93-98`	`CliSessionMeta`, with broadened catch for bad-hex / non-UTF-8 / bad-JSON

Verification

Check	Result
`python3 -m unittest discover tests -v`	204 pass (178 existing + 26 new)
`mypy --no-strict-optional models/`	`Success: no issues found in 6 source files`
Live smoke against my Cursor workspaceStorage (17 composers)	All routes 200; zero `Schema drift` warnings — no false positives on real data

Notes

createdAt is gated as required on Composer per the issue spec. A live scan confirmed 17/17 real composers carry it; one in-repo test fixture (test_search_exclusion_filtering) was updated to match.
_read_json_file was duplicated in api/workspaces.py:66 and api/composers.py:20. Still duplicated on this branch (consolidation lives in PR refactor(api): split api/workspaces.py 1,407 → 142 LOC into services/ (closes #25) #31 — the api/workspaces.py split for Monolith extraction: split api/workspaces.py first module #25).
Eight follow-up commits in this PR address CodeRabbit findings: non-dict guards on every from_dict, drop of or [] and str() coercion fallbacks that masked drift, Symbol-throw safety in catch arms, structured-error contract on per-row workspace-local composer drift, and except Exception: pass → print on the outer read loop.

Summary by CodeRabbit

Refactor
- Stronger runtime validation and normalization of stored conversation and workspace records before use.
Bug Fixes
- Malformed or old-format entries are now skipped and logged instead of surfacing incorrect data; search, workspace listings, and composer lookups use validated timestamps and headers for consistent results.
- CLI session metadata errors now report schema drift and return empty results safely.
Chores
- Export manifest now includes title and workspace metadata and skips invalid lines.
Tests
- Added extensive tests covering model validation, read-path behavior, manifest parsing, and blob traversal.

) Add models/ package with @DataClass definitions for Workspace, Composer, Bubble, CliSessionMeta, ExportEntry. Each model has a from_dict() classmethod that raises SchemaError when critical fields are missing, replacing silent dict.get() fallbacks at the four database read boundaries identified by the eval (Section 5.1, Schema Fragility). Wire into: - api/workspaces.py:601 — composerData rows in workspace listing - api/composers.py:57-63 — allComposers envelope in per-workspace fetch - api/search.py:163-164 — composerData rows during search - utils/cli_chat_reader.py:93-98 — CLI session meta blob in traverse_blobs Schema drift surfaces as a printed warning + skipped row, not a silent empty result. Existing call sites preserve their JSON response shapes (behavior-preserving wire-in).

coderabbitai · 2026-05-11T14:42:39Z

📝 Walkthrough

Walkthrough

Adds frozen dataclass models with from_dict validators and SchemaError, wires validation into api/composers, api/search, api/workspaces, and utils/cli_chat_reader, updates export handling, and adds tests and package exports to detect on-disk JSON schema drift.

Changes

Typed Models + Schema Validation Foundation

Layer / File(s)	Summary
SchemaError Exception `models/errors.py`	Introduces `SchemaError(ValueError)` that stores `model`, `field`, and optional `hint`, used by model validators.
Composer, WorkspaceLocalComposer, Bubble `models/conversation.py`	Adds `Composer`, `WorkspaceLocalComposer`, and `Bubble` frozen dataclasses with `from_dict()` validation, required-field checks, normalization, and `raw` preservation.
CliSessionMeta `models/cli_session.py`	Adds `CliSessionMeta` frozen dataclass with `from_dict()` validating required `latestRootBlobId` and optional `createdAt`.
ExportEntry `models/export.py`	Adds `ExportEntry` frozen dataclass with required `log_id`, `title`, `workspace` validation and `from_dict()`.
Workspace Model `models/workspace.py`	Adds `Workspace` frozen dataclass and `from_dict()` validating `workspace_id` and optional `folder` typed as `str
Package Boundary `models/__init__.py`	Re-exports public models and `SchemaError` via `__all__`.
Composer Validation: api/composers.py `api/composers.py`	`list_composers` now validates `composer.composerData`: requires `allComposers` list, validates entries via `WorkspaceLocalComposer.from_dict`, normalizes `conversation`, attaches workspace metadata, and logs per-DB `SchemaError`. `get_composer` validates per-workspace entries and validates global fallback via `Composer.from_dict` (returns 404 on drift).
Composer/Bubble Validation: api/search.py `api/search.py`	Global composer search parses composer and bubble rows with `Composer.from_dict()` / `Bubble.from_dict()`, logs/skips `SchemaError`, and sources search/exclusion fields/timestamps from typed `Composer` attributes.
Workspace endpoints wired to models `api/workspaces.py`	`list_workspaces()` and tabs route parse workspace/bubble/composer rows with typed models, skip invalid rows, and build conversation entries from typed `Composer` attributes.
CliSessionMeta: CLI reader `utils/cli_chat_reader.py`	`traverse_blobs()` uses `CliSessionMeta.from_dict()`; on schema/parse/decoding errors logs "Schema drift" with `db_path` and returns `[]`; otherwise derives root from `meta.latest_root_blob_id`.
scripts/export.py: manifest & metadata `scripts/export.py`	`_load_manifest_entries` now uses `ExportEntry.from_dict()` and skips legacy malformed rows; exported manifest entries include `title` and `workspace`.
Tests: models, CLI reader, fixtures, wired integration `tests/*`	Adds `tests/test_models.py`, `tests/test_cli_chat_reader.py` regressions, updates `tests/test_search_exclusion_filtering.py` fixtures, and adds `tests/test_models_wired_at_read_sites.py` to assert read-site wiring uses model.from_dict.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API_Search as /api/search
  participant ComposerModel as Composer.from_dict
  participant GlobalDB as Global_cursorDiskKV
  participant BubbleModel as Bubble.from_dict
  Client->>API_Search: search request
  API_Search->>GlobalDB: query composerData rows
  GlobalDB->>API_Search: composer row JSON
  API_Search->>ComposerModel: Composer.from_dict(row_json, composer_id)
  ComposerModel-->>API_Search: Composer instance or raises SchemaError
  API_Search->>BubbleModel: Bubble.from_dict(bubble_json, bubble_id)
  BubbleModel-->>API_Search: Bubble instance or raises SchemaError
  API_Search-->>Client: search results (skipping invalid/malformed rows)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

cppalliance/cppa-cursor-browser#2: Related — both PRs modify /api/search and workspace/composer handling to change parsing/validation and exclusion matching.

Poem

🐰 I hop through dataclasses neat and spry,

Validators sniff the drift and cry,
SchemaError waves a tiny flag,
Tests tap paws and skip the rag,
Now stored JSON learns to comply.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: typed models + schema validation at DB read boundaries' accurately reflects the main changes: introduction of typed models with schema validation at database read boundaries.
Linked Issues check	✅ Passed	The PR addresses all acceptance criteria from issue `#24`: models package added [Composer, WorkspaceLocalComposer, Bubble, CliSessionMeta, Workspace, ExportEntry], from_dict validators implemented raising SchemaError on missing/invalid fields, validation wired into read sites [api/workspaces.py, api/composers.py, api/search.py, utils/cli_chat_reader.py], unit tests added including known-good/missing-field/binary cases, and mypy compatibility confirmed.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#24` objectives: models implementation, schema validation, wiring into specified read sites, test coverage, and build compatibility. No out-of-scope modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/typed-models-schema-validation-24

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/composers.py`:
- Around line 57-63: After json.loads(row[0]) in the code that builds
WorkspaceComposers (around the data variable), validate that the decoded payload
is a dict/object before checking keys: add an isinstance(data, dict) guard and
raise SchemaError("WorkspaceComposers", "object") (or similar) if it is not a
mapping, then proceed to check "allComposers" and that it is a list; this
ensures composer.composerData cannot be a non-object and bypass the SchemaError
path.

In `@models/cli_session.py`:
- Around line 29-33: from_dict in CliSessionMeta assumes raw is a dict and will
raise AttributeError on non-dict payloads; add an explicit type guard at the
start of CliSessionMeta.from_dict to check isinstance(raw, dict) and if false
raise SchemaError("CliSessionMeta", "raw" or "payload") so callers get a
SchemaError instead of an AttributeError, then proceed with the existing logic
that validates latestRootBlobId.

In `@models/conversation.py`:
- Around line 43-49: The code currently uses headers =
raw.get("fullConversationHeadersOnly") or [] which treats falsy non-list values
(e.g. "", 0, False, None) as an empty list and skips validation; instead,
explicitly check for the presence of the key and only default when the key is
absent: if "fullConversationHeadersOnly" in raw assign value =
raw["fullConversationHeadersOnly"], validate that value is a list and raise
SchemaError("Composer", "fullConversationHeadersOnly", hint=...) when it is not;
otherwise (key absent) set headers = [] — use the symbols raw, headers,
"fullConversationHeadersOnly", and SchemaError to locate and update the logic.

In `@models/export.py`:
- Around line 30-36: The code currently coerces required fields using str(...)
which masks upstream type drift; instead, in the validation loop for required
fields (the block that raises SchemaError("ExportEntry", required)), check that
raw[required] is an instance of str and not empty (e.g.,
isinstance(raw[required], str) and raw[required] != ""), raise SchemaError if
the type is wrong or empty, and then pass the values into cls(...) without
casting (remove str(...) around log_id, title, workspace) so types are preserved
for ExportEntry creation.

In `@tests/test_models.py`:
- Around line 113-119: The test test_composer_missing_created_at_is_tolerated
contradicts Issue `#24` which requires createdAt for Composer; update the test to
enforce the new contract by ensuring Composer.from_dict raises or fails when
"createdAt" is missing (or adjust GOOD_COMPOSER_RAW to include createdAt), e.g.,
change the setup that strips "createdAt" so Composer.from_dict(composer_data,
composer_id="cid-001") triggers the required validation instead of asserting
composer.created_at is None; locate the test function
test_composer_missing_created_at_is_tolerated and the Composer.from_dict
handling of createdAt and modify the test expectation to match the required
schema (raise/validation error) or restore the required createdAt field in
GOOD_COMPOSER_RAW so the test aligns with the new contract.

In `@utils/cli_chat_reader.py`:
- Around line 98-104: The try/except around CliSessionMeta.from_dict only
catches SchemaError so hex-decoding and JSON parsing errors can escape; wrap the
bytes.fromhex(...).decode(...) and json.loads(...) plus CliSessionMeta.from_dict
in the same try and catch decoding/parsing exceptions (ValueError,
UnicodeDecodeError, json.JSONDecodeError and optionally binascii.Error) in
addition to SchemaError, log/print a message including db_path and the
exception, and return [] when any of those errors occur so malformed meta_row
data does not crash the reader.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 611715de-847b-4f23-9a1c-dec9fac36d10

📥 Commits

Reviewing files that changed from the base of the PR and between 95d3140 and 1f83f12.

📒 Files selected for processing (11)

api/composers.py
api/search.py
api/workspaces.py
models/__init__.py
models/cli_session.py
models/conversation.py
models/errors.py
models/export.py
models/workspace.py
tests/test_models.py
utils/cli_chat_reader.py

Six fixes from the CodeRabbit pass on the initial typed-models commit, plus matching regression tests so each can't silently regress later. 1. api/composers.py: validate the top-level JSON payload is a dict before key/list checks. A non-object decoded value previously bypassed the SchemaError path and was swallowed by the surrounding except. 2. models/{cli_session,conversation,workspace,export}.py: same isinstance guard at the top of every from_dict, so a non-dict raw (list, str, None) surfaces as SchemaError instead of AttributeError. 3. models/conversation.py: drop the `raw.get(...) or []` fallback on fullConversationHeadersOnly. The `or []` idiom coerced None/"" / 0 into [] and made the isinstance gate unreachable. 4. models/export.py: replace str(raw[...]) coercion of required string fields with an isinstance(value, str) gate. Coercion masked drift — an int log_id would silently turn into a string. 5. models/conversation.py: re-gate createdAt as required per issue #24. A live workspaceStorage scan confirmed 17/17 composers carry it, so a missing value is real schema drift, not benign older-record absence. Two existing test fixtures in test_search_exclusion_filtering that lacked createdAt updated; the prior tolerated test replaced with a spec-correct missing-raises test. 6. utils/cli_chat_reader.py: broaden the traverse_blobs catch to (SchemaError, ValueError, UnicodeDecodeError, TypeError). Malformed hex, non-UTF-8 bytes, and invalid JSON in the meta blob previously escaped to the caller; now all four routes return [] uniformly. Regression tests added: - tests/test_models.py: non_dict_payload (5 models × 5 bad types), headers_falsy_non_list, headers_empty_list_is_valid, export_entry_non_string_required, missing_created_at_raises - tests/test_cli_chat_reader.py: malformed_hex, non_utf8, invalid_json, non_dict meta payloads each return [] via traverse_blobs 201 tests pass (was 197); mypy clean on models/; real Cursor data (17 composers) still loads with zero Schema drift warnings.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/composers.py`:
- Around line 73-77: In the loop over all_composers replace the silent skip with
explicit SchemaError handling: in the block iterating all_composers, instead of
"if not isinstance(c, dict) or not c.get('composerId'): continue", raise or
collect a SchemaError that includes the offending item (c) and the
workspace/name (name) so per-row schema drift is surfaced; keep the subsequent
normalization (setting "conversation" and "workspaceId") only after the item
passes the validation check. Ensure you reference the same symbols:
all_composers, c, "composerId", "conversation", "workspaceId", and SchemaError
so the failing row is routed through the schema error path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b35c559-46a0-4d96-920b-62dbe5486445

📥 Commits

Reviewing files that changed from the base of the PR and between 1f83f12 and e853dfd.

📒 Files selected for processing (9)

api/composers.py
models/cli_session.py
models/conversation.py
models/export.py
models/workspace.py
tests/test_cli_chat_reader.py
tests/test_models.py
tests/test_search_exclusion_filtering.py
utils/cli_chat_reader.py

🚧 Files skipped from review as they are similar to previous changes (6)

models/workspace.py
utils/cli_chat_reader.py
models/cli_session.py
models/export.py
tests/test_models.py
models/conversation.py

…ror (#30) CodeRabbit flagged that the inline ``isinstance(c, dict) or not c.get(...)`` skip in api/composers.py masked row-level drift instead of surfacing it the way the rest of the boundary does. The exact suggested fix (Composer.from_dict(c)) would have rejected every workspace-local record: those carry composerId + lastUpdatedAt but not fullConversationHeadersOnly or createdAt, which only exist on the global cursorDiskKV rows. Add WorkspaceLocalComposer to models/conversation.py — a slim sibling model that validates the workspace-local shape (composerId required as a non-empty string; non-dict raw raises). api/composers.py:list_composers now wraps each entry in WorkspaceLocalComposer.from_dict, so per-row drift logs a Schema drift warning and skips, matching every other read-site in this PR. Regression coverage in tests/test_models.py: parses good shape; raises SchemaError for missing/empty/non-string composerId and for non-dict payloads. 204 tests pass (was 201; +3); mypy clean on models/; real Cursor smoke test shows zero drift warnings.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/composers.py`:
- Around line 85-86: The except Exception: pass is silently swallowing decode/DB
errors; update the except block in the function that reads composers to catch
Exception as e and log the error and db_path (e.g. logger.exception or
processLogger.error with e and db_path) the same way the schema-drift path does,
so failures are recorded while still returning the partial list (do not remove
the existing continue/return behavior).

In `@models/conversation.py`:
- Around line 47-48: Composer.from_dict currently enforces createdAt as required
by raising SchemaError when it's missing; change that method so it no longer
raises on missing "createdAt" — instead read raw.get("createdAt") and, if
present, parse/convert it as before, otherwise set the Composer.createdAt
attribute to None (or a sensible default) so rows without createdAt are
accepted; remove the explicit SchemaError check for "createdAt" and ensure any
downstream code that expects createdAt can handle a None value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b16c8af7-d11b-4cba-94dc-c03a5bc8b57f

📥 Commits

Reviewing files that changed from the base of the PR and between e853dfd and 7774f41.

📒 Files selected for processing (4)

api/composers.py
models/__init__.py
models/conversation.py
tests/test_models.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_models.py

CodeRabbit flagged that the bare ``except Exception: pass`` adjacent to the new SchemaError path swallowed decode / SQLite / non-schema errors silently and returned a partial composer list with no signal. Replace with a print that surfaces the exception and ``db_path``, matching the ``Schema drift in ...`` line a few lines up. No behaviour change beyond observability — the loop still continues to the next workspace.

…30) Internal cleanup on the typed-models package — file location and class name carry the intent; the multi-paragraph rationales were rehashing the PR description. Function bodies are unchanged. - models/__init__.py: removed 8-line module docstring - models/errors.py: dropped module docstring + SchemaError class docstring trimmed to one line - models/conversation.py: dropped module docstring; Composer, WorkspaceLocalComposer, Bubble docstrings trimmed to one line each - models/cli_session.py: dropped module docstring; CliSessionMeta docstring trimmed - models/workspace.py: dropped module docstring; Workspace docstring trimmed - models/export.py: dropped module docstring; ExportEntry docstring trimmed - tests/test_models.py: dropped module docstring; removed now-unused ``from pathlib import Path`` 204 tests pass.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/conversation.py`:
- Around line 29-30: The current truthiness checks for composer_id and bubble_id
can allow non-string truthy values; update both validation sites (the
composer_id check shown and the bubble_id check around the later block) to
require a non-empty str (e.g., isinstance(composer_id, str) and
composer_id.strip() != ""), and do the same for bubble_id, raising
SchemaError("Composer","composerId", hint="empty composer ID") or
SchemaError("Bubble","bubbleId", hint="empty bubble ID") as appropriate to match
WorkspaceLocalComposer's stricter validation.
- Around line 33-35: The code currently checks presence of "createdAt" but does
not validate or normalize its type before assigning to created_at; update the
parsing in the constructor or factory that reads raw to validate that
raw["createdAt"] is the expected type (e.g., ISO8601 string or numeric
timestamp), attempt to convert it to the internal representation (e.g., datetime
via datetime.fromisoformat or datetime.fromtimestamp) and if
conversion/validation fails raise SchemaError("Composer", "createdAt") (or a
more specific message); modify the assignment to created_at to use the
validated/converted value and ensure similar validation is added where
created_at is set (referencing raw, createdAt, created_at, and SchemaError).

In `@models/workspace.py`:
- Around line 25-26: The current truthiness check for workspace_id allows
non-string types (e.g., 123); update the validation to require a non-empty
string. Replace the `if not workspace_id:` check with an explicit
type-and-content check such as `if not isinstance(workspace_id, str) or not
workspace_id.strip():` and raise the same SchemaError("Workspace",
"workspaceId", hint="empty workspace ID") when the check fails; reference the
workspace_id variable and the SchemaError class in models/workspace.py to locate
the validation logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 140ecfbf-18f3-450e-a2f0-898111f4de53

📥 Commits

Reviewing files that changed from the base of the PR and between d983b9b and 545886d.

📒 Files selected for processing (7)

models/__init__.py
models/cli_session.py
models/conversation.py
models/errors.py
models/export.py
models/workspace.py
tests/test_models.py

✅ Files skipped from review due to trivial changes (1)

models/init.py

🚧 Files skipped from review as they are similar to previous changes (2)

models/export.py
models/cli_session.py

Per the review nit: stop narrating in code. Removes: - tests/test_models.py: 3 ``# --- section ---`` banners and 10 inline comments referencing issue #24 / CodeRabbit findings. - tests/test_cli_chat_reader.py: 4 ``"""Regression: ..."""`` docstrings and 4 line-comments restating what the test exercises. - utils/cli_chat_reader.py: 5-line narration on the broadened except arm — the tuple of exception types is the documentation. Test names and code already describe the WHAT; the WHY for these cases is in the PR description and the commit history, not the source. 204 tests still pass.

coderabbitai · 2026-05-12T13:07:22Z

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details

{}

Three related CodeRabbit findings, addressed together: - Composer.composer_id, Bubble.bubble_id, Workspace.workspace_id now require ``isinstance(..., str) and value != ""``. Truthiness alone let non-string IDs (e.g. ``123``) slip past the boundary; aligned with the pattern already in WorkspaceLocalComposer. - Composer.createdAt is now type-gated. Presence was already enforced but any value type (e.g. ``"2026-05-12"``) silently became the timestamp. Required values must be ``int`` or ``float`` and explicitly not ``bool`` (since ``bool`` is a subclass of ``int`` in Python and a boolean is never a real timestamp). Regression tests pin each new gate: - test_non_string_composer_id_raises - test_non_string_bubble_id_raises - test_non_string_workspace_id_raises - test_non_numeric_created_at_raises - test_numeric_created_at_passes 209 tests pass (was 204; +5 new).

bradjin8

False Positives / Correctly Handled

WorkspaceLocalComposer vs Composer split — Correct design. Per-workspace rows lack fullConversationHeadersOnly / createdAt; using Composer.from_dict there would reject every real record. CodeRabbit's round-2 suggestion to use Composer.from_dict(c) was rightly rejected and replaced with WorkspaceLocalComposer.
createdAt required on Composer — Aligned with issue #24 after the round-1 fix; fixtures updated; test_missing_created_at_raises replaces the old tolerant test.
CodeRabbit rounds 1–3 — All actionable findings appear addressed in follow-up commits (non-dict guards, dropped or [] / str() coercion, broadened CLI catch, per-row SchemaError routing, ID type gates, createdAt numeric gate).
SchemaError(ValueError) — Preserves compatibility with existing except ValueError sites; tested in test_schema_error_inherits_value_error.

bradjin8 · 2026-05-14T02:19:53Z

`Bubble`, `Workspace`, `ExportEntry` — defined and tested but not wired at any production read site**

The PR table claims:

Model	Claimed site	Actual
`Bubble`	`bubbleId:*` rows	`api/workspaces.py:584–586`, `api/search.py:150–153` still use `json.loads` + `isinstance(b, dict)` only
`Workspace`	`workspace.json`	`api/composers.py:44–45`, `api/workspaces.py` still use `_read_json_file` / `json.load` + `.get("folder")`
`ExportEntry`	`manifest.jsonl`	`scripts/export.py:65–68` still uses `entry.get("log_id")` with silent skip

All three models have thorough unit tests in tests/test_models.py but zero production callers. The package is only half-delivered relative to the PR's own "Used at" table.

Closes five blocking findings from Brad's review pass: 1. Bubble / Workspace / ExportEntry were defined and unit-tested but had zero production callers — the PR's own "Used at" table didn't match the code. Wired each at the cited sites: * Bubble in api/workspaces.py (both bubble loaders, line 584 + 1023) * Bubble in api/search.py:150 * Workspace in api/composers.py:44 and api/workspaces.py:54 * ExportEntry reader in scripts/export.py:66 — paired with writer updates so the on-disk shape matches what the model accepts. 2. get_workspace_tabs() in api/workspaces.py bypassed Composer.from_dict while list_workspaces() in the same file validated. The two primary conversation-browsing paths now agree on what counts as a valid row. 3. get_composer() returned the raw dict from either per-workspace DB or global storage with no validation. Now goes through WorkspaceLocalComposer.from_dict (per-workspace, drift logged + skipped) and Composer.from_dict (global fallback, drift -> 404 + log). 4. WorkspaceLocalComposer.from_dict result was discarded after the gate. The typed object is now load-bearing: the sort key reads local.last_updated_at and the JSON's composerId/lastUpdatedAt are written from the validated fields, not the raw dict. 5. SchemaError message always read "missing required field" even on type/shape mismatches. Now reads "invalid field" when hint is set and "missing required field" only when hint is None, so log grepping can tell the two drift modes apart. Plus one forward-looking note: added a WHY comment at the numeric-only createdAt check in models/conversation.py pointing at the 17/17 live scan that justifies the strict isinstance(int, float) gate. 11 new regression tests: - tests/test_models_wired_at_read_sites.py (new file, 10 tests) pinning each from_dict call site with spy + side_effect patches so a future refactor that drops the typed model fails loudly. - tests/test_models.py one new test asserting the SchemaError wording distinction holds. Verified locally: - python -m unittest discover tests: 220 passed, OK (was 209) - python -m unittest tests.test_models_wired_at_read_sites: 10/10 pass - python app.py boots clean, curl / returns 200

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/composers.py`:
- Around line 125-139: The code in get_composer() currently assumes
json.loads(row[0]) yields a dict with a list-valued "allComposers"; instead
mirror the composer.composerData envelope guards by validating the decoded
payload: after data = json.loads(...), check isinstance(data, dict) and that
data.get("allComposers") is a list; if those checks fail, raise a ValueError (or
another exception that escapes the current silent pass) with a clear message so
schema drift is surfaced (this touches the get_composer() path and the
WorkspaceLocalComposer.from_dict usage).

In `@api/workspaces.py`:
- Around line 583-589: The catch block around Bubble.from_dict that populates
bubble_map currently swallows SchemaError and JSON errors silently; change it to
split exception handling so SchemaError is caught separately and logged
(including bubble id bid and any identifying row info) instead of being ignored,
while keeping JSONDecodeError/ValueError handling for malformed rows as before;
apply the same split-and-log pattern to both bubble-loading loops that use
Bubble.from_dict and bubble_map so schema drift gets an operator-visible log
entry rather than being dropped silently.

In `@tests/test_models_wired_at_read_sites.py`:
- Around line 338-343: Add a scripts/__init__.py file so the scripts directory
is a proper Python package and the test import "from scripts import export as
export_mod" resolves consistently for type checking; this is needed so symbols
referenced in the test (export_mod, export_mod.ExportEntry.from_dict, and
export_mod._load_manifest_entries) can be properly imported by mypy/CI. Create
an empty or minimal __init__.py (no runtime changes) in the scripts package,
commit it, and re-run the tests/typecheck to confirm the import errors are
resolved.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88de2a2e-45f0-4e7c-a83c-d99c92278253

📥 Commits

Reviewing files that changed from the base of the PR and between 59ae103 and 16f861d.

📒 Files selected for processing (8)

api/composers.py
api/search.py
api/workspaces.py
models/conversation.py
models/errors.py
scripts/export.py
tests/test_models.py
tests/test_models_wired_at_read_sites.py

🚧 Files skipped from review as they are similar to previous changes (2)

models/errors.py
models/conversation.py

CodeRabbit follow-ups to the round-2 wiring pass: 1. The three bubble-loading loops (api/workspaces.py:583, 1025; api/search.py:149) caught SchemaError, json.JSONDecodeError and ValueError together and dropped the row silently. That broke the read-boundary contract — a drifted bubble would simply vanish with no operator signal. Split the catches: SchemaError now logs `Schema drift in bubble <bid>: <reason>`; JSONDecodeError / ValueError still pass silently because that's a parser-noise issue, not a schema-contract issue. 2. get_composer() at api/composers.py:125 trusted that the per-workspace blob was a dict with a list-valued allComposers. A drifted local row like `[]` or `"…"` raised AttributeError on data.get(...) and turned schema drift into a 500. Mirrored the three envelope guards list_composers() already applies at line 60–74 (data is dict, allComposers present, allComposers is list) and wrapped the outer try with `except SchemaError` so drift logs the db_path and falls through to the global fallback instead of 500'ing. 3. scripts/ lacked __init__.py while every other top-level package directory (api/, models/, utils/, tests/) has one. The new `from scripts import export` calls in tests resolve at runtime via the sys.path insert in the test header, but mypy was reporting inconsistent module resolution on scripts.export — a CI typecheck risk. Added an empty scripts/__init__.py to match convention. 3 new regression tests in tests/test_models_wired_at_read_sites.py: - test_bubble_schema_drift_is_logged_not_swallowed_silently — seeds a malformed bubble, captures stdout, asserts the `Schema drift in bubble <bid>` line is present. - test_get_composer_handles_non_dict_envelope_via_fallback — replaces composer.composerData with a list payload, asserts the endpoint returns 200 via the global fallback (was 500 before the envelope guards). - test_get_composer_handles_non_list_all_composers_via_fallback — same shape for non-list allComposers. Verified locally: - python -m unittest discover tests: 223 passed (was 220) - python -m unittest tests.test_models_wired_at_read_sites: 13/13 - mypy resolves scripts.export consistently

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

api/search.py (1)

55-55: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add type annotation for combined to fix mypy error.

The pipeline is failing because mypy requires a type annotation for the combined list.

Proposed fix

-    combined = []
+    combined: list[str] = []

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/search.py` at line 55, Add an explicit type annotation for the local
variable `combined` in api/search.py (the list assigned with `combined = []`) to
satisfy mypy; declare it as a list of the correct element type (e.g.,
List[YourElementType] or List[Any] if the element type is heterogeneous), and
add the necessary import from typing (List or Any) at the top of the module;
keep the variable name `combined` and the existing assignment but change it to
include the type annotation so mypy no longer requires an implicit type.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@api/search.py`:
- Line 55: Add an explicit type annotation for the local variable `combined` in
api/search.py (the list assigned with `combined = []`) to satisfy mypy; declare
it as a list of the correct element type (e.g., List[YourElementType] or
List[Any] if the element type is heterogeneous), and add the necessary import
from typing (List or Any) at the top of the module; keep the variable name
`combined` and the existing assignment but change it to include the type
annotation so mypy no longer requires an implicit type.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a5f9d84f-b61b-4b67-b4eb-ad75835e0705

📥 Commits

Reviewing files that changed from the base of the PR and between 16f861d and b9f34a6.

📒 Files selected for processing (5)

api/composers.py
api/search.py
api/workspaces.py
scripts/__init__.py
tests/test_models_wired_at_read_sites.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_models_wired_at_read_sites.py
api/workspaces.py

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread api/composers.py

Comment thread models/cli_session.py

Comment thread models/conversation.py Outdated

Comment thread models/export.py Outdated

Comment thread tests/test_models.py Outdated

Comment thread utils/cli_chat_reader.py

timon0305 self-assigned this May 11, 2026

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread api/composers.py

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread api/composers.py Outdated

Comment thread models/conversation.py

timon0305 requested a review from bradjin8 May 11, 2026 21:57

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread models/conversation.py Outdated

Comment thread models/conversation.py

Comment thread models/workspace.py Outdated

bradjin8 requested changes May 14, 2026

View reviewed changes

Comment thread api/workspaces.py

Comment thread api/composers.py

Comment thread models/errors.py Outdated

Comment thread api/composers.py Outdated

Comment thread models/conversation.py

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread api/composers.py

Comment thread api/workspaces.py

Comment thread tests/test_models_wired_at_read_sites.py

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

timon0305 requested a review from bradjin8 May 14, 2026 13:12

bradjin8 approved these changes May 14, 2026

View reviewed changes

bradjin8 mentioned this pull request May 14, 2026

chore: clean linter warnings + remove mypy continue-on-error (closes #29) #35

Merged

7 tasks

Conversation

timon0305 commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Verification

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

bradjin8 left a comment

Choose a reason for hiding this comment

False Positives / Correctly Handled

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bradjin8 commented May 14, 2026

Bubble, Workspace, ExportEntry — defined and tested but not wired at any production read site**

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timon0305 commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading

`Bubble`, `Workspace`, `ExportEntry` — defined and tested but not wired at any production read site**