feat: typed models + schema validation at DB read boundaries (closes #24)#30
feat: typed models + schema validation at DB read boundaries (closes #24)#30timon0305 wants to merge 9 commits into
Conversation
) Add models/ package with @DataClass definitions for Workspace, Composer, Bubble, CliSessionMeta, ExportEntry. Each model has a from_dict() classmethod that raises SchemaError when critical fields are missing, replacing silent dict.get() fallbacks at the four database read boundaries identified by the eval (Section 5.1, Schema Fragility). Wire into: - api/workspaces.py:601 — composerData rows in workspace listing - api/composers.py:57-63 — allComposers envelope in per-workspace fetch - api/search.py:163-164 — composerData rows during search - utils/cli_chat_reader.py:93-98 — CLI session meta blob in traverse_blobs Schema drift surfaces as a printed warning + skipped row, not a silent empty result. Existing call sites preserve their JSON response shapes (behavior-preserving wire-in).
📝 WalkthroughWalkthroughAdds frozen dataclass models with from_dict validators and SchemaError, wires validation into api/composers, api/search, api/workspaces, and utils/cli_chat_reader, updates export handling, and adds tests and package exports to detect on-disk JSON schema drift. ChangesTyped Models + Schema Validation Foundation
Sequence Diagram(s)sequenceDiagram
participant Client
participant API_Search as /api/search
participant ComposerModel as Composer.from_dict
participant GlobalDB as Global_cursorDiskKV
participant BubbleModel as Bubble.from_dict
Client->>API_Search: search request
API_Search->>GlobalDB: query composerData rows
GlobalDB->>API_Search: composer row JSON
API_Search->>ComposerModel: Composer.from_dict(row_json, composer_id)
ComposerModel-->>API_Search: Composer instance or raises SchemaError
API_Search->>BubbleModel: Bubble.from_dict(bubble_json, bubble_id)
BubbleModel-->>API_Search: Bubble instance or raises SchemaError
API_Search-->>Client: search results (skipping invalid/malformed rows)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/composers.py`:
- Around line 57-63: After json.loads(row[0]) in the code that builds
WorkspaceComposers (around the data variable), validate that the decoded payload
is a dict/object before checking keys: add an isinstance(data, dict) guard and
raise SchemaError("WorkspaceComposers", "object") (or similar) if it is not a
mapping, then proceed to check "allComposers" and that it is a list; this
ensures composer.composerData cannot be a non-object and bypass the SchemaError
path.
In `@models/cli_session.py`:
- Around line 29-33: from_dict in CliSessionMeta assumes raw is a dict and will
raise AttributeError on non-dict payloads; add an explicit type guard at the
start of CliSessionMeta.from_dict to check isinstance(raw, dict) and if false
raise SchemaError("CliSessionMeta", "raw" or "payload") so callers get a
SchemaError instead of an AttributeError, then proceed with the existing logic
that validates latestRootBlobId.
In `@models/conversation.py`:
- Around line 43-49: The code currently uses headers =
raw.get("fullConversationHeadersOnly") or [] which treats falsy non-list values
(e.g. "", 0, False, None) as an empty list and skips validation; instead,
explicitly check for the presence of the key and only default when the key is
absent: if "fullConversationHeadersOnly" in raw assign value =
raw["fullConversationHeadersOnly"], validate that value is a list and raise
SchemaError("Composer", "fullConversationHeadersOnly", hint=...) when it is not;
otherwise (key absent) set headers = [] — use the symbols raw, headers,
"fullConversationHeadersOnly", and SchemaError to locate and update the logic.
In `@models/export.py`:
- Around line 30-36: The code currently coerces required fields using str(...)
which masks upstream type drift; instead, in the validation loop for required
fields (the block that raises SchemaError("ExportEntry", required)), check that
raw[required] is an instance of str and not empty (e.g.,
isinstance(raw[required], str) and raw[required] != ""), raise SchemaError if
the type is wrong or empty, and then pass the values into cls(...) without
casting (remove str(...) around log_id, title, workspace) so types are preserved
for ExportEntry creation.
In `@tests/test_models.py`:
- Around line 113-119: The test test_composer_missing_created_at_is_tolerated
contradicts Issue `#24` which requires createdAt for Composer; update the test to
enforce the new contract by ensuring Composer.from_dict raises or fails when
"createdAt" is missing (or adjust GOOD_COMPOSER_RAW to include createdAt), e.g.,
change the setup that strips "createdAt" so Composer.from_dict(composer_data,
composer_id="cid-001") triggers the required validation instead of asserting
composer.created_at is None; locate the test function
test_composer_missing_created_at_is_tolerated and the Composer.from_dict
handling of createdAt and modify the test expectation to match the required
schema (raise/validation error) or restore the required createdAt field in
GOOD_COMPOSER_RAW so the test aligns with the new contract.
In `@utils/cli_chat_reader.py`:
- Around line 98-104: The try/except around CliSessionMeta.from_dict only
catches SchemaError so hex-decoding and JSON parsing errors can escape; wrap the
bytes.fromhex(...).decode(...) and json.loads(...) plus CliSessionMeta.from_dict
in the same try and catch decoding/parsing exceptions (ValueError,
UnicodeDecodeError, json.JSONDecodeError and optionally binascii.Error) in
addition to SchemaError, log/print a message including db_path and the
exception, and return [] when any of those errors occur so malformed meta_row
data does not crash the reader.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 611715de-847b-4f23-9a1c-dec9fac36d10
📒 Files selected for processing (11)
api/composers.pyapi/search.pyapi/workspaces.pymodels/__init__.pymodels/cli_session.pymodels/conversation.pymodels/errors.pymodels/export.pymodels/workspace.pytests/test_models.pyutils/cli_chat_reader.py
Six fixes from the CodeRabbit pass on the initial typed-models commit,
plus matching regression tests so each can't silently regress later.
1. api/composers.py: validate the top-level JSON payload is a dict before
key/list checks. A non-object decoded value previously bypassed the
SchemaError path and was swallowed by the surrounding except.
2. models/{cli_session,conversation,workspace,export}.py: same isinstance
guard at the top of every from_dict, so a non-dict raw (list, str, None)
surfaces as SchemaError instead of AttributeError.
3. models/conversation.py: drop the `raw.get(...) or []` fallback on
fullConversationHeadersOnly. The `or []` idiom coerced None/"" / 0
into [] and made the isinstance gate unreachable.
4. models/export.py: replace str(raw[...]) coercion of required string
fields with an isinstance(value, str) gate. Coercion masked drift —
an int log_id would silently turn into a string.
5. models/conversation.py: re-gate createdAt as required per issue #24.
A live workspaceStorage scan confirmed 17/17 composers carry it,
so a missing value is real schema drift, not benign older-record
absence. Two existing test fixtures in test_search_exclusion_filtering
that lacked createdAt updated; the prior tolerated test replaced with
a spec-correct missing-raises test.
6. utils/cli_chat_reader.py: broaden the traverse_blobs catch to
(SchemaError, ValueError, UnicodeDecodeError, TypeError). Malformed
hex, non-UTF-8 bytes, and invalid JSON in the meta blob previously
escaped to the caller; now all four routes return [] uniformly.
Regression tests added:
- tests/test_models.py: non_dict_payload (5 models × 5 bad types),
headers_falsy_non_list, headers_empty_list_is_valid,
export_entry_non_string_required, missing_created_at_raises
- tests/test_cli_chat_reader.py: malformed_hex, non_utf8, invalid_json,
non_dict meta payloads each return [] via traverse_blobs
201 tests pass (was 197); mypy clean on models/; real Cursor data
(17 composers) still loads with zero Schema drift warnings.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/composers.py`:
- Around line 73-77: In the loop over all_composers replace the silent skip with
explicit SchemaError handling: in the block iterating all_composers, instead of
"if not isinstance(c, dict) or not c.get('composerId'): continue", raise or
collect a SchemaError that includes the offending item (c) and the
workspace/name (name) so per-row schema drift is surfaced; keep the subsequent
normalization (setting "conversation" and "workspaceId") only after the item
passes the validation check. Ensure you reference the same symbols:
all_composers, c, "composerId", "conversation", "workspaceId", and SchemaError
so the failing row is routed through the schema error path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1b35c559-46a0-4d96-920b-62dbe5486445
📒 Files selected for processing (9)
api/composers.pymodels/cli_session.pymodels/conversation.pymodels/export.pymodels/workspace.pytests/test_cli_chat_reader.pytests/test_models.pytests/test_search_exclusion_filtering.pyutils/cli_chat_reader.py
🚧 Files skipped from review as they are similar to previous changes (6)
- models/workspace.py
- utils/cli_chat_reader.py
- models/cli_session.py
- models/export.py
- tests/test_models.py
- models/conversation.py
…ror (#30) CodeRabbit flagged that the inline ``isinstance(c, dict) or not c.get(...)`` skip in api/composers.py masked row-level drift instead of surfacing it the way the rest of the boundary does. The exact suggested fix (Composer.from_dict(c)) would have rejected every workspace-local record: those carry composerId + lastUpdatedAt but not fullConversationHeadersOnly or createdAt, which only exist on the global cursorDiskKV rows. Add WorkspaceLocalComposer to models/conversation.py — a slim sibling model that validates the workspace-local shape (composerId required as a non-empty string; non-dict raw raises). api/composers.py:list_composers now wraps each entry in WorkspaceLocalComposer.from_dict, so per-row drift logs a Schema drift warning and skips, matching every other read-site in this PR. Regression coverage in tests/test_models.py: parses good shape; raises SchemaError for missing/empty/non-string composerId and for non-dict payloads. 204 tests pass (was 201; +3); mypy clean on models/; real Cursor smoke test shows zero drift warnings.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/composers.py`:
- Around line 85-86: The except Exception: pass is silently swallowing decode/DB
errors; update the except block in the function that reads composers to catch
Exception as e and log the error and db_path (e.g. logger.exception or
processLogger.error with e and db_path) the same way the schema-drift path does,
so failures are recorded while still returning the partial list (do not remove
the existing continue/return behavior).
In `@models/conversation.py`:
- Around line 47-48: Composer.from_dict currently enforces createdAt as required
by raising SchemaError when it's missing; change that method so it no longer
raises on missing "createdAt" — instead read raw.get("createdAt") and, if
present, parse/convert it as before, otherwise set the Composer.createdAt
attribute to None (or a sensible default) so rows without createdAt are
accepted; remove the explicit SchemaError check for "createdAt" and ensure any
downstream code that expects createdAt can handle a None value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b16c8af7-d11b-4cba-94dc-c03a5bc8b57f
📒 Files selected for processing (4)
api/composers.pymodels/__init__.pymodels/conversation.pytests/test_models.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_models.py
CodeRabbit flagged that the bare ``except Exception: pass`` adjacent to the new SchemaError path swallowed decode / SQLite / non-schema errors silently and returned a partial composer list with no signal. Replace with a print that surfaces the exception and ``db_path``, matching the ``Schema drift in ...`` line a few lines up. No behaviour change beyond observability — the loop still continues to the next workspace.
…30) Internal cleanup on the typed-models package — file location and class name carry the intent; the multi-paragraph rationales were rehashing the PR description. Function bodies are unchanged. - models/__init__.py: removed 8-line module docstring - models/errors.py: dropped module docstring + SchemaError class docstring trimmed to one line - models/conversation.py: dropped module docstring; Composer, WorkspaceLocalComposer, Bubble docstrings trimmed to one line each - models/cli_session.py: dropped module docstring; CliSessionMeta docstring trimmed - models/workspace.py: dropped module docstring; Workspace docstring trimmed - models/export.py: dropped module docstring; ExportEntry docstring trimmed - tests/test_models.py: dropped module docstring; removed now-unused ``from pathlib import Path`` 204 tests pass.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@models/conversation.py`:
- Around line 29-30: The current truthiness checks for composer_id and bubble_id
can allow non-string truthy values; update both validation sites (the
composer_id check shown and the bubble_id check around the later block) to
require a non-empty str (e.g., isinstance(composer_id, str) and
composer_id.strip() != ""), and do the same for bubble_id, raising
SchemaError("Composer","composerId", hint="empty composer ID") or
SchemaError("Bubble","bubbleId", hint="empty bubble ID") as appropriate to match
WorkspaceLocalComposer's stricter validation.
- Around line 33-35: The code currently checks presence of "createdAt" but does
not validate or normalize its type before assigning to created_at; update the
parsing in the constructor or factory that reads raw to validate that
raw["createdAt"] is the expected type (e.g., ISO8601 string or numeric
timestamp), attempt to convert it to the internal representation (e.g., datetime
via datetime.fromisoformat or datetime.fromtimestamp) and if
conversion/validation fails raise SchemaError("Composer", "createdAt") (or a
more specific message); modify the assignment to created_at to use the
validated/converted value and ensure similar validation is added where
created_at is set (referencing raw, createdAt, created_at, and SchemaError).
In `@models/workspace.py`:
- Around line 25-26: The current truthiness check for workspace_id allows
non-string types (e.g., 123); update the validation to require a non-empty
string. Replace the `if not workspace_id:` check with an explicit
type-and-content check such as `if not isinstance(workspace_id, str) or not
workspace_id.strip():` and raise the same SchemaError("Workspace",
"workspaceId", hint="empty workspace ID") when the check fails; reference the
workspace_id variable and the SchemaError class in models/workspace.py to locate
the validation logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 140ecfbf-18f3-450e-a2f0-898111f4de53
📒 Files selected for processing (7)
models/__init__.pymodels/cli_session.pymodels/conversation.pymodels/errors.pymodels/export.pymodels/workspace.pytests/test_models.py
✅ Files skipped from review due to trivial changes (1)
- models/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
- models/export.py
- models/cli_session.py
Per the review nit: stop narrating in code. Removes:
- tests/test_models.py: 3 ``# --- section ---`` banners and 10
inline comments referencing issue #24 / CodeRabbit findings.
- tests/test_cli_chat_reader.py: 4 ``"""Regression: ..."""``
docstrings and 4 line-comments restating what the test exercises.
- utils/cli_chat_reader.py: 5-line narration on the broadened
except arm — the tuple of exception types is the documentation.
Test names and code already describe the WHAT; the WHY for these cases
is in the PR description and the commit history, not the source.
204 tests still pass.
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
Three related CodeRabbit findings, addressed together:
- Composer.composer_id, Bubble.bubble_id, Workspace.workspace_id now
require ``isinstance(..., str) and value != ""``. Truthiness alone
let non-string IDs (e.g. ``123``) slip past the boundary; aligned
with the pattern already in WorkspaceLocalComposer.
- Composer.createdAt is now type-gated. Presence was already enforced
but any value type (e.g. ``"2026-05-12"``) silently became the
timestamp. Required values must be ``int`` or ``float`` and explicitly
not ``bool`` (since ``bool`` is a subclass of ``int`` in Python and
a boolean is never a real timestamp).
Regression tests pin each new gate:
- test_non_string_composer_id_raises
- test_non_string_bubble_id_raises
- test_non_string_workspace_id_raises
- test_non_numeric_created_at_raises
- test_numeric_created_at_passes
209 tests pass (was 204; +5 new).
bradjin8
left a comment
There was a problem hiding this comment.
False Positives / Correctly Handled
- WorkspaceLocalComposer vs Composer split — Correct design. Per-workspace rows lack fullConversationHeadersOnly / createdAt; using Composer.from_dict there would reject every real record. CodeRabbit's round-2 suggestion to use Composer.from_dict(c) was rightly rejected and replaced with WorkspaceLocalComposer.
- createdAt required on Composer — Aligned with issue #24 after the round-1 fix; fixtures updated; test_missing_created_at_raises replaces the old tolerant test.
- CodeRabbit rounds 1–3 — All actionable findings appear addressed in follow-up commits (non-dict guards, dropped or [] / str() coercion, broadened CLI catch, per-row SchemaError routing, ID type gates, createdAt numeric gate).
- SchemaError(ValueError) — Preserves compatibility with existing except ValueError sites; tested in test_schema_error_inherits_value_error.
|
| Model | Claimed site | Actual |
|---|---|---|
Bubble |
bubbleId:* rows |
api/workspaces.py:584–586, api/search.py:150–153 still use json.loads + isinstance(b, dict) only |
Workspace |
workspace.json |
api/composers.py:44–45, api/workspaces.py still use _read_json_file / json.load + .get("folder") |
ExportEntry |
manifest.jsonl |
scripts/export.py:65–68 still uses entry.get("log_id") with silent skip |
All three models have thorough unit tests in tests/test_models.py but zero production callers. The package is only half-delivered relative to the PR's own "Used at" table.
Closes five blocking findings from Brad's review pass:
1. Bubble / Workspace / ExportEntry were defined and unit-tested but had
zero production callers — the PR's own "Used at" table didn't match
the code. Wired each at the cited sites:
* Bubble in api/workspaces.py (both bubble loaders, line 584 + 1023)
* Bubble in api/search.py:150
* Workspace in api/composers.py:44 and api/workspaces.py:54
* ExportEntry reader in scripts/export.py:66 — paired with writer
updates so the on-disk shape matches what the model accepts.
2. get_workspace_tabs() in api/workspaces.py bypassed Composer.from_dict
while list_workspaces() in the same file validated. The two primary
conversation-browsing paths now agree on what counts as a valid row.
3. get_composer() returned the raw dict from either per-workspace DB or
global storage with no validation. Now goes through
WorkspaceLocalComposer.from_dict (per-workspace, drift logged + skipped)
and Composer.from_dict (global fallback, drift -> 404 + log).
4. WorkspaceLocalComposer.from_dict result was discarded after the gate.
The typed object is now load-bearing: the sort key reads
local.last_updated_at and the JSON's composerId/lastUpdatedAt are
written from the validated fields, not the raw dict.
5. SchemaError message always read "missing required field" even on
type/shape mismatches. Now reads "invalid field" when hint is set
and "missing required field" only when hint is None, so log grepping
can tell the two drift modes apart.
Plus one forward-looking note: added a WHY comment at the numeric-only
createdAt check in models/conversation.py pointing at the 17/17 live
scan that justifies the strict isinstance(int, float) gate.
11 new regression tests:
- tests/test_models_wired_at_read_sites.py (new file, 10 tests)
pinning each from_dict call site with spy + side_effect patches so
a future refactor that drops the typed model fails loudly.
- tests/test_models.py
one new test asserting the SchemaError wording distinction holds.
Verified locally:
- python -m unittest discover tests: 220 passed, OK (was 209)
- python -m unittest tests.test_models_wired_at_read_sites: 10/10 pass
- python app.py boots clean, curl / returns 200
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/composers.py`:
- Around line 125-139: The code in get_composer() currently assumes
json.loads(row[0]) yields a dict with a list-valued "allComposers"; instead
mirror the composer.composerData envelope guards by validating the decoded
payload: after data = json.loads(...), check isinstance(data, dict) and that
data.get("allComposers") is a list; if those checks fail, raise a ValueError (or
another exception that escapes the current silent pass) with a clear message so
schema drift is surfaced (this touches the get_composer() path and the
WorkspaceLocalComposer.from_dict usage).
In `@api/workspaces.py`:
- Around line 583-589: The catch block around Bubble.from_dict that populates
bubble_map currently swallows SchemaError and JSON errors silently; change it to
split exception handling so SchemaError is caught separately and logged
(including bubble id bid and any identifying row info) instead of being ignored,
while keeping JSONDecodeError/ValueError handling for malformed rows as before;
apply the same split-and-log pattern to both bubble-loading loops that use
Bubble.from_dict and bubble_map so schema drift gets an operator-visible log
entry rather than being dropped silently.
In `@tests/test_models_wired_at_read_sites.py`:
- Around line 338-343: Add a scripts/__init__.py file so the scripts directory
is a proper Python package and the test import "from scripts import export as
export_mod" resolves consistently for type checking; this is needed so symbols
referenced in the test (export_mod, export_mod.ExportEntry.from_dict, and
export_mod._load_manifest_entries) can be properly imported by mypy/CI. Create
an empty or minimal __init__.py (no runtime changes) in the scripts package,
commit it, and re-run the tests/typecheck to confirm the import errors are
resolved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 88de2a2e-45f0-4e7c-a83c-d99c92278253
📒 Files selected for processing (8)
api/composers.pyapi/search.pyapi/workspaces.pymodels/conversation.pymodels/errors.pyscripts/export.pytests/test_models.pytests/test_models_wired_at_read_sites.py
🚧 Files skipped from review as they are similar to previous changes (2)
- models/errors.py
- models/conversation.py
CodeRabbit follow-ups to the round-2 wiring pass:
1. The three bubble-loading loops (api/workspaces.py:583, 1025;
api/search.py:149) caught SchemaError, json.JSONDecodeError and
ValueError together and dropped the row silently. That broke the
read-boundary contract — a drifted bubble would simply vanish with
no operator signal. Split the catches: SchemaError now logs
`Schema drift in bubble <bid>: <reason>`; JSONDecodeError /
ValueError still pass silently because that's a parser-noise issue,
not a schema-contract issue.
2. get_composer() at api/composers.py:125 trusted that the per-workspace
blob was a dict with a list-valued allComposers. A drifted local row
like `[]` or `"…"` raised AttributeError on data.get(...) and turned
schema drift into a 500. Mirrored the three envelope guards
list_composers() already applies at line 60–74 (data is dict,
allComposers present, allComposers is list) and wrapped the outer
try with `except SchemaError` so drift logs the db_path and falls
through to the global fallback instead of 500'ing.
3. scripts/ lacked __init__.py while every other top-level package
directory (api/, models/, utils/, tests/) has one. The new
`from scripts import export` calls in tests resolve at runtime via
the sys.path insert in the test header, but mypy was reporting
inconsistent module resolution on scripts.export — a CI typecheck
risk. Added an empty scripts/__init__.py to match convention.
3 new regression tests in tests/test_models_wired_at_read_sites.py:
- test_bubble_schema_drift_is_logged_not_swallowed_silently —
seeds a malformed bubble, captures stdout, asserts the
`Schema drift in bubble <bid>` line is present.
- test_get_composer_handles_non_dict_envelope_via_fallback —
replaces composer.composerData with a list payload, asserts
the endpoint returns 200 via the global fallback (was 500
before the envelope guards).
- test_get_composer_handles_non_list_all_composers_via_fallback —
same shape for non-list allComposers.
Verified locally:
- python -m unittest discover tests: 223 passed (was 220)
- python -m unittest tests.test_models_wired_at_read_sites: 13/13
- mypy resolves scripts.export consistently
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/search.py (1)
55-55:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd type annotation for
combinedto fix mypy error.The pipeline is failing because mypy requires a type annotation for the
combinedlist.Proposed fix
- combined = [] + combined: list[str] = []🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/search.py` at line 55, Add an explicit type annotation for the local variable `combined` in api/search.py (the list assigned with `combined = []`) to satisfy mypy; declare it as a list of the correct element type (e.g., List[YourElementType] or List[Any] if the element type is heterogeneous), and add the necessary import from typing (List or Any) at the top of the module; keep the variable name `combined` and the existing assignment but change it to include the type annotation so mypy no longer requires an implicit type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@api/search.py`:
- Line 55: Add an explicit type annotation for the local variable `combined` in
api/search.py (the list assigned with `combined = []`) to satisfy mypy; declare
it as a list of the correct element type (e.g., List[YourElementType] or
List[Any] if the element type is heterogeneous), and add the necessary import
from typing (List or Any) at the top of the module; keep the variable name
`combined` and the existing assignment but change it to include the type
annotation so mypy no longer requires an implicit type.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a5f9d84f-b61b-4b67-b4eb-ad75835e0705
📒 Files selected for processing (5)
api/composers.pyapi/search.pyapi/workspaces.pyscripts/__init__.pytests/test_models_wired_at_read_sites.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/test_models_wired_at_read_sites.py
- api/workspaces.py
Closes #24.
What
Adds a typed-model layer at the four Cursor SQLite read sites. Each
from_dict()raisesSchemaErrorwhen a critical field is missing, so a schema rename surfaces as a printed warning instead of an empty conversation rendered silently.Why
Per the eval (Section 5.1 — Schema Fragility), Cursor's on-disk JSON shapes are not versioned. Every read site used
.get("field")with a fallback, so a Cursor rename ofcomposerData,bubbleId,latestRootBlobId, etc. would returnNone, the fallback would apply, and the user would see an empty conversation with no error in the log. The eval called this "the tool's primary failure mode, unobservable by any automated means."How
New
models/package — six dataclasses, each with afrom_dict()validator:ComposercursorDiskKVcomposer rowsWorkspaceLocalComposerItemTablecomposer summariesBubblebubbleId:*rowsCliSessionMetametablobWorkspaceworkspace.jsonExportEntrymanifest.jsonl)SchemaErrorinherits fromValueErrorso existingexcept ValueErrorcall sites still trap it.Wire-ins (each catches
SchemaError, printsSchema drift in ..., skips the row):api/workspaces.py:601Composerapi/composers.py:57-77WorkspaceLocalComposer(per-row) + envelope dict/list guardsapi/search.py:163-164Composerutils/cli_chat_reader.py:93-98CliSessionMeta, with broadened catch for bad-hex / non-UTF-8 / bad-JSONVerification
python3 -m unittest discover tests -vmypy --no-strict-optional models/Success: no issues found in 6 source filesSchema driftwarnings — no false positives on real dataNotes
createdAtis gated as required onComposerper the issue spec. A live scan confirmed 17/17 real composers carry it; one in-repo test fixture (test_search_exclusion_filtering) was updated to match._read_json_filewas duplicated inapi/workspaces.py:66andapi/composers.py:20. Still duplicated on this branch (consolidation lives in PR refactor(api): split api/workspaces.py 1,407 → 142 LOC into services/ (closes #25) #31 — theapi/workspaces.pysplit for Monolith extraction: split api/workspaces.py first module #25).from_dict, drop ofor []andstr()coercion fallbacks that masked drift, Symbol-throw safety in catch arms, structured-error contract on per-row workspace-local composer drift, andexcept Exception: pass→printon the outer read loop.Summary by CodeRabbit
Refactor
Bug Fixes
Chores
Tests