Skip to content

feat: typed models + schema validation at DB read boundaries (closes #24)#30

Open
timon0305 wants to merge 9 commits into
masterfrom
feat/typed-models-schema-validation-24
Open

feat: typed models + schema validation at DB read boundaries (closes #24)#30
timon0305 wants to merge 9 commits into
masterfrom
feat/typed-models-schema-validation-24

Conversation

@timon0305
Copy link
Copy Markdown
Collaborator

@timon0305 timon0305 commented May 11, 2026

Closes #24.

What

Adds a typed-model layer at the four Cursor SQLite read sites. Each from_dict() raises SchemaError when a critical field is missing, so a schema rename surfaces as a printed warning instead of an empty conversation rendered silently.

Why

Per the eval (Section 5.1 — Schema Fragility), Cursor's on-disk JSON shapes are not versioned. Every read site used .get("field") with a fallback, so a Cursor rename of composerData, bubbleId, latestRootBlobId, etc. would return None, the fallback would apply, and the user would see an empty conversation with no error in the log. The eval called this "the tool's primary failure mode, unobservable by any automated means."

How

New models/ package — six dataclasses, each with a from_dict() validator:

Model Used at
Composer global cursorDiskKV composer rows
WorkspaceLocalComposer per-workspace ItemTable composer summaries
Bubble bubbleId:* rows
CliSessionMeta CLI session meta blob
Workspace workspace.json
ExportEntry export manifest (manifest.jsonl)

SchemaError inherits from ValueError so existing except ValueError call sites still trap it.

Wire-ins (each catches SchemaError, prints Schema drift in ..., skips the row):

Read site Model
api/workspaces.py:601 Composer
api/composers.py:57-77 WorkspaceLocalComposer (per-row) + envelope dict/list guards
api/search.py:163-164 Composer
utils/cli_chat_reader.py:93-98 CliSessionMeta, with broadened catch for bad-hex / non-UTF-8 / bad-JSON

Verification

Check Result
python3 -m unittest discover tests -v 204 pass (178 existing + 26 new)
mypy --no-strict-optional models/ Success: no issues found in 6 source files
Live smoke against my Cursor workspaceStorage (17 composers) All routes 200; zero Schema drift warnings — no false positives on real data

Notes

  • createdAt is gated as required on Composer per the issue spec. A live scan confirmed 17/17 real composers carry it; one in-repo test fixture (test_search_exclusion_filtering) was updated to match.
  • _read_json_file was duplicated in api/workspaces.py:66 and api/composers.py:20. Still duplicated on this branch (consolidation lives in PR refactor(api): split api/workspaces.py 1,407 → 142 LOC into services/ (closes #25) #31 — the api/workspaces.py split for Monolith extraction: split api/workspaces.py first module #25).
  • Eight follow-up commits in this PR address CodeRabbit findings: non-dict guards on every from_dict, drop of or [] and str() coercion fallbacks that masked drift, Symbol-throw safety in catch arms, structured-error contract on per-row workspace-local composer drift, and except Exception: passprint on the outer read loop.

Summary by CodeRabbit

  • Refactor

    • Stronger runtime validation and normalization of stored conversation and workspace records before use.
  • Bug Fixes

    • Malformed or old-format entries are now skipped and logged instead of surfacing incorrect data; search, workspace listings, and composer lookups use validated timestamps and headers for consistent results.
    • CLI session metadata errors now report schema drift and return empty results safely.
  • Chores

    • Export manifest now includes title and workspace metadata and skips invalid lines.
  • Tests

    • Added extensive tests covering model validation, read-path behavior, manifest parsing, and blob traversal.

Review Change Stack

)

Add models/ package with @DataClass definitions for Workspace, Composer,
Bubble, CliSessionMeta, ExportEntry. Each model has a from_dict() classmethod
that raises SchemaError when critical fields are missing, replacing silent
dict.get() fallbacks at the four database read boundaries identified by
the eval (Section 5.1, Schema Fragility).

Wire into:
  - api/workspaces.py:601   — composerData rows in workspace listing
  - api/composers.py:57-63  — allComposers envelope in per-workspace fetch
  - api/search.py:163-164   — composerData rows during search
  - utils/cli_chat_reader.py:93-98 — CLI session meta blob in traverse_blobs

Schema drift surfaces as a printed warning + skipped row, not a silent
empty result. Existing call sites preserve their JSON response shapes
(behavior-preserving wire-in).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

Adds frozen dataclass models with from_dict validators and SchemaError, wires validation into api/composers, api/search, api/workspaces, and utils/cli_chat_reader, updates export handling, and adds tests and package exports to detect on-disk JSON schema drift.

Changes

Typed Models + Schema Validation Foundation

Layer / File(s) Summary
SchemaError Exception
models/errors.py
Introduces SchemaError(ValueError) that stores model, field, and optional hint, used by model validators.
Composer, WorkspaceLocalComposer, Bubble
models/conversation.py
Adds Composer, WorkspaceLocalComposer, and Bubble frozen dataclasses with from_dict() validation, required-field checks, normalization, and raw preservation.
CliSessionMeta
models/cli_session.py
Adds CliSessionMeta frozen dataclass with from_dict() validating required latestRootBlobId and optional createdAt.
ExportEntry
models/export.py
Adds ExportEntry frozen dataclass with required log_id, title, workspace validation and from_dict().
Workspace Model
models/workspace.py
Adds Workspace frozen dataclass and from_dict() validating workspace_id and optional folder typed as `str
Package Boundary
models/__init__.py
Re-exports public models and SchemaError via __all__.
Composer Validation: api/composers.py
api/composers.py
list_composers now validates composer.composerData: requires allComposers list, validates entries via WorkspaceLocalComposer.from_dict, normalizes conversation, attaches workspace metadata, and logs per-DB SchemaError. get_composer validates per-workspace entries and validates global fallback via Composer.from_dict (returns 404 on drift).
Composer/Bubble Validation: api/search.py
api/search.py
Global composer search parses composer and bubble rows with Composer.from_dict() / Bubble.from_dict(), logs/skips SchemaError, and sources search/exclusion fields/timestamps from typed Composer attributes.
Workspace endpoints wired to models
api/workspaces.py
list_workspaces() and tabs route parse workspace/bubble/composer rows with typed models, skip invalid rows, and build conversation entries from typed Composer attributes.
CliSessionMeta: CLI reader
utils/cli_chat_reader.py
traverse_blobs() uses CliSessionMeta.from_dict(); on schema/parse/decoding errors logs "Schema drift" with db_path and returns []; otherwise derives root from meta.latest_root_blob_id.
scripts/export.py: manifest & metadata
scripts/export.py
_load_manifest_entries now uses ExportEntry.from_dict() and skips legacy malformed rows; exported manifest entries include title and workspace.
Tests: models, CLI reader, fixtures, wired integration
tests/*
Adds tests/test_models.py, tests/test_cli_chat_reader.py regressions, updates tests/test_search_exclusion_filtering.py fixtures, and adds tests/test_models_wired_at_read_sites.py to assert read-site wiring uses model.from_dict.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API_Search as /api/search
  participant ComposerModel as Composer.from_dict
  participant GlobalDB as Global_cursorDiskKV
  participant BubbleModel as Bubble.from_dict
  Client->>API_Search: search request
  API_Search->>GlobalDB: query composerData rows
  GlobalDB->>API_Search: composer row JSON
  API_Search->>ComposerModel: Composer.from_dict(row_json, composer_id)
  ComposerModel-->>API_Search: Composer instance or raises SchemaError
  API_Search->>BubbleModel: Bubble.from_dict(bubble_json, bubble_id)
  BubbleModel-->>API_Search: Bubble instance or raises SchemaError
  API_Search-->>Client: search results (skipping invalid/malformed rows)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hop through dataclasses neat and spry,

Validators sniff the drift and cry,
SchemaError waves a tiny flag,
Tests tap paws and skip the rag,
Now stored JSON learns to comply.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: typed models + schema validation at DB read boundaries' accurately reflects the main changes: introduction of typed models with schema validation at database read boundaries.
Linked Issues check ✅ Passed The PR addresses all acceptance criteria from issue #24: models package added [Composer, WorkspaceLocalComposer, Bubble, CliSessionMeta, Workspace, ExportEntry], from_dict validators implemented raising SchemaError on missing/invalid fields, validation wired into read sites [api/workspaces.py, api/composers.py, api/search.py, utils/cli_chat_reader.py], unit tests added including known-good/missing-field/binary cases, and mypy compatibility confirmed.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #24 objectives: models implementation, schema validation, wiring into specified read sites, test coverage, and build compatibility. No out-of-scope modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/typed-models-schema-validation-24

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/composers.py`:
- Around line 57-63: After json.loads(row[0]) in the code that builds
WorkspaceComposers (around the data variable), validate that the decoded payload
is a dict/object before checking keys: add an isinstance(data, dict) guard and
raise SchemaError("WorkspaceComposers", "object") (or similar) if it is not a
mapping, then proceed to check "allComposers" and that it is a list; this
ensures composer.composerData cannot be a non-object and bypass the SchemaError
path.

In `@models/cli_session.py`:
- Around line 29-33: from_dict in CliSessionMeta assumes raw is a dict and will
raise AttributeError on non-dict payloads; add an explicit type guard at the
start of CliSessionMeta.from_dict to check isinstance(raw, dict) and if false
raise SchemaError("CliSessionMeta", "raw" or "payload") so callers get a
SchemaError instead of an AttributeError, then proceed with the existing logic
that validates latestRootBlobId.

In `@models/conversation.py`:
- Around line 43-49: The code currently uses headers =
raw.get("fullConversationHeadersOnly") or [] which treats falsy non-list values
(e.g. "", 0, False, None) as an empty list and skips validation; instead,
explicitly check for the presence of the key and only default when the key is
absent: if "fullConversationHeadersOnly" in raw assign value =
raw["fullConversationHeadersOnly"], validate that value is a list and raise
SchemaError("Composer", "fullConversationHeadersOnly", hint=...) when it is not;
otherwise (key absent) set headers = [] — use the symbols raw, headers,
"fullConversationHeadersOnly", and SchemaError to locate and update the logic.

In `@models/export.py`:
- Around line 30-36: The code currently coerces required fields using str(...)
which masks upstream type drift; instead, in the validation loop for required
fields (the block that raises SchemaError("ExportEntry", required)), check that
raw[required] is an instance of str and not empty (e.g.,
isinstance(raw[required], str) and raw[required] != ""), raise SchemaError if
the type is wrong or empty, and then pass the values into cls(...) without
casting (remove str(...) around log_id, title, workspace) so types are preserved
for ExportEntry creation.

In `@tests/test_models.py`:
- Around line 113-119: The test test_composer_missing_created_at_is_tolerated
contradicts Issue `#24` which requires createdAt for Composer; update the test to
enforce the new contract by ensuring Composer.from_dict raises or fails when
"createdAt" is missing (or adjust GOOD_COMPOSER_RAW to include createdAt), e.g.,
change the setup that strips "createdAt" so Composer.from_dict(composer_data,
composer_id="cid-001") triggers the required validation instead of asserting
composer.created_at is None; locate the test function
test_composer_missing_created_at_is_tolerated and the Composer.from_dict
handling of createdAt and modify the test expectation to match the required
schema (raise/validation error) or restore the required createdAt field in
GOOD_COMPOSER_RAW so the test aligns with the new contract.

In `@utils/cli_chat_reader.py`:
- Around line 98-104: The try/except around CliSessionMeta.from_dict only
catches SchemaError so hex-decoding and JSON parsing errors can escape; wrap the
bytes.fromhex(...).decode(...) and json.loads(...) plus CliSessionMeta.from_dict
in the same try and catch decoding/parsing exceptions (ValueError,
UnicodeDecodeError, json.JSONDecodeError and optionally binascii.Error) in
addition to SchemaError, log/print a message including db_path and the
exception, and return [] when any of those errors occur so malformed meta_row
data does not crash the reader.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 611715de-847b-4f23-9a1c-dec9fac36d10

📥 Commits

Reviewing files that changed from the base of the PR and between 95d3140 and 1f83f12.

📒 Files selected for processing (11)
  • api/composers.py
  • api/search.py
  • api/workspaces.py
  • models/__init__.py
  • models/cli_session.py
  • models/conversation.py
  • models/errors.py
  • models/export.py
  • models/workspace.py
  • tests/test_models.py
  • utils/cli_chat_reader.py

Comment thread api/composers.py
Comment thread models/cli_session.py
Comment thread models/conversation.py Outdated
Comment thread models/export.py Outdated
Comment thread tests/test_models.py Outdated
Comment thread utils/cli_chat_reader.py
@timon0305 timon0305 self-assigned this May 11, 2026
Six fixes from the CodeRabbit pass on the initial typed-models commit,
plus matching regression tests so each can't silently regress later.

1. api/composers.py: validate the top-level JSON payload is a dict before
   key/list checks. A non-object decoded value previously bypassed the
   SchemaError path and was swallowed by the surrounding except.

2. models/{cli_session,conversation,workspace,export}.py: same isinstance
   guard at the top of every from_dict, so a non-dict raw (list, str, None)
   surfaces as SchemaError instead of AttributeError.

3. models/conversation.py: drop the `raw.get(...) or []` fallback on
   fullConversationHeadersOnly. The `or []` idiom coerced None/"" / 0
   into [] and made the isinstance gate unreachable.

4. models/export.py: replace str(raw[...]) coercion of required string
   fields with an isinstance(value, str) gate. Coercion masked drift —
   an int log_id would silently turn into a string.

5. models/conversation.py: re-gate createdAt as required per issue #24.
   A live workspaceStorage scan confirmed 17/17 composers carry it,
   so a missing value is real schema drift, not benign older-record
   absence. Two existing test fixtures in test_search_exclusion_filtering
   that lacked createdAt updated; the prior tolerated test replaced with
   a spec-correct missing-raises test.

6. utils/cli_chat_reader.py: broaden the traverse_blobs catch to
   (SchemaError, ValueError, UnicodeDecodeError, TypeError). Malformed
   hex, non-UTF-8 bytes, and invalid JSON in the meta blob previously
   escaped to the caller; now all four routes return [] uniformly.

Regression tests added:
  - tests/test_models.py: non_dict_payload (5 models × 5 bad types),
    headers_falsy_non_list, headers_empty_list_is_valid,
    export_entry_non_string_required, missing_created_at_raises
  - tests/test_cli_chat_reader.py: malformed_hex, non_utf8, invalid_json,
    non_dict meta payloads each return [] via traverse_blobs

201 tests pass (was 197); mypy clean on models/; real Cursor data
(17 composers) still loads with zero Schema drift warnings.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/composers.py`:
- Around line 73-77: In the loop over all_composers replace the silent skip with
explicit SchemaError handling: in the block iterating all_composers, instead of
"if not isinstance(c, dict) or not c.get('composerId'): continue", raise or
collect a SchemaError that includes the offending item (c) and the
workspace/name (name) so per-row schema drift is surfaced; keep the subsequent
normalization (setting "conversation" and "workspaceId") only after the item
passes the validation check. Ensure you reference the same symbols:
all_composers, c, "composerId", "conversation", "workspaceId", and SchemaError
so the failing row is routed through the schema error path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b35c559-46a0-4d96-920b-62dbe5486445

📥 Commits

Reviewing files that changed from the base of the PR and between 1f83f12 and e853dfd.

📒 Files selected for processing (9)
  • api/composers.py
  • models/cli_session.py
  • models/conversation.py
  • models/export.py
  • models/workspace.py
  • tests/test_cli_chat_reader.py
  • tests/test_models.py
  • tests/test_search_exclusion_filtering.py
  • utils/cli_chat_reader.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • models/workspace.py
  • utils/cli_chat_reader.py
  • models/cli_session.py
  • models/export.py
  • tests/test_models.py
  • models/conversation.py

Comment thread api/composers.py
…ror (#30)

CodeRabbit flagged that the inline ``isinstance(c, dict) or not c.get(...)``
skip in api/composers.py masked row-level drift instead of surfacing it the
way the rest of the boundary does. The exact suggested fix
(Composer.from_dict(c)) would have rejected every workspace-local record:
those carry composerId + lastUpdatedAt but not fullConversationHeadersOnly
or createdAt, which only exist on the global cursorDiskKV rows.

Add WorkspaceLocalComposer to models/conversation.py — a slim sibling
model that validates the workspace-local shape (composerId required as a
non-empty string; non-dict raw raises). api/composers.py:list_composers
now wraps each entry in WorkspaceLocalComposer.from_dict, so per-row
drift logs a Schema drift warning and skips, matching every other
read-site in this PR.

Regression coverage in tests/test_models.py: parses good shape;
raises SchemaError for missing/empty/non-string composerId and for
non-dict payloads.

204 tests pass (was 201; +3); mypy clean on models/; real Cursor smoke
test shows zero drift warnings.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/composers.py`:
- Around line 85-86: The except Exception: pass is silently swallowing decode/DB
errors; update the except block in the function that reads composers to catch
Exception as e and log the error and db_path (e.g. logger.exception or
processLogger.error with e and db_path) the same way the schema-drift path does,
so failures are recorded while still returning the partial list (do not remove
the existing continue/return behavior).

In `@models/conversation.py`:
- Around line 47-48: Composer.from_dict currently enforces createdAt as required
by raising SchemaError when it's missing; change that method so it no longer
raises on missing "createdAt" — instead read raw.get("createdAt") and, if
present, parse/convert it as before, otherwise set the Composer.createdAt
attribute to None (or a sensible default) so rows without createdAt are
accepted; remove the explicit SchemaError check for "createdAt" and ensure any
downstream code that expects createdAt can handle a None value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b16c8af7-d11b-4cba-94dc-c03a5bc8b57f

📥 Commits

Reviewing files that changed from the base of the PR and between e853dfd and 7774f41.

📒 Files selected for processing (4)
  • api/composers.py
  • models/__init__.py
  • models/conversation.py
  • tests/test_models.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_models.py

Comment thread api/composers.py Outdated
Comment thread models/conversation.py
CodeRabbit flagged that the bare ``except Exception: pass`` adjacent to
the new SchemaError path swallowed decode / SQLite / non-schema errors
silently and returned a partial composer list with no signal. Replace
with a print that surfaces the exception and ``db_path``, matching the
``Schema drift in ...`` line a few lines up. No behaviour change beyond
observability — the loop still continues to the next workspace.
@timon0305 timon0305 requested a review from bradjin8 May 11, 2026 21:57
…30)

Internal cleanup on the typed-models package — file location and class
name carry the intent; the multi-paragraph rationales were rehashing the
PR description. Function bodies are unchanged.

  - models/__init__.py:        removed 8-line module docstring
  - models/errors.py:          dropped module docstring + SchemaError
                               class docstring trimmed to one line
  - models/conversation.py:    dropped module docstring; Composer,
                               WorkspaceLocalComposer, Bubble docstrings
                               trimmed to one line each
  - models/cli_session.py:     dropped module docstring; CliSessionMeta
                               docstring trimmed
  - models/workspace.py:       dropped module docstring; Workspace
                               docstring trimmed
  - models/export.py:          dropped module docstring; ExportEntry
                               docstring trimmed
  - tests/test_models.py:      dropped module docstring; removed
                               now-unused ``from pathlib import Path``

204 tests pass.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/conversation.py`:
- Around line 29-30: The current truthiness checks for composer_id and bubble_id
can allow non-string truthy values; update both validation sites (the
composer_id check shown and the bubble_id check around the later block) to
require a non-empty str (e.g., isinstance(composer_id, str) and
composer_id.strip() != ""), and do the same for bubble_id, raising
SchemaError("Composer","composerId", hint="empty composer ID") or
SchemaError("Bubble","bubbleId", hint="empty bubble ID") as appropriate to match
WorkspaceLocalComposer's stricter validation.
- Around line 33-35: The code currently checks presence of "createdAt" but does
not validate or normalize its type before assigning to created_at; update the
parsing in the constructor or factory that reads raw to validate that
raw["createdAt"] is the expected type (e.g., ISO8601 string or numeric
timestamp), attempt to convert it to the internal representation (e.g., datetime
via datetime.fromisoformat or datetime.fromtimestamp) and if
conversion/validation fails raise SchemaError("Composer", "createdAt") (or a
more specific message); modify the assignment to created_at to use the
validated/converted value and ensure similar validation is added where
created_at is set (referencing raw, createdAt, created_at, and SchemaError).

In `@models/workspace.py`:
- Around line 25-26: The current truthiness check for workspace_id allows
non-string types (e.g., 123); update the validation to require a non-empty
string. Replace the `if not workspace_id:` check with an explicit
type-and-content check such as `if not isinstance(workspace_id, str) or not
workspace_id.strip():` and raise the same SchemaError("Workspace",
"workspaceId", hint="empty workspace ID") when the check fails; reference the
workspace_id variable and the SchemaError class in models/workspace.py to locate
the validation logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 140ecfbf-18f3-450e-a2f0-898111f4de53

📥 Commits

Reviewing files that changed from the base of the PR and between d983b9b and 545886d.

📒 Files selected for processing (7)
  • models/__init__.py
  • models/cli_session.py
  • models/conversation.py
  • models/errors.py
  • models/export.py
  • models/workspace.py
  • tests/test_models.py
✅ Files skipped from review due to trivial changes (1)
  • models/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • models/export.py
  • models/cli_session.py

Comment thread models/conversation.py Outdated
Comment thread models/conversation.py
Comment thread models/workspace.py Outdated
Per the review nit: stop narrating in code. Removes:

  - tests/test_models.py: 3 ``# --- section ---`` banners and 10
    inline comments referencing issue #24 / CodeRabbit findings.
  - tests/test_cli_chat_reader.py: 4 ``"""Regression: ..."""``
    docstrings and 4 line-comments restating what the test exercises.
  - utils/cli_chat_reader.py: 5-line narration on the broadened
    except arm — the tuple of exception types is the documentation.

Test names and code already describe the WHAT; the WHY for these cases
is in the PR description and the commit history, not the source.

204 tests still pass.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

Three related CodeRabbit findings, addressed together:

  - Composer.composer_id, Bubble.bubble_id, Workspace.workspace_id now
    require ``isinstance(..., str) and value != ""``. Truthiness alone
    let non-string IDs (e.g. ``123``) slip past the boundary; aligned
    with the pattern already in WorkspaceLocalComposer.

  - Composer.createdAt is now type-gated. Presence was already enforced
    but any value type (e.g. ``"2026-05-12"``) silently became the
    timestamp. Required values must be ``int`` or ``float`` and explicitly
    not ``bool`` (since ``bool`` is a subclass of ``int`` in Python and
    a boolean is never a real timestamp).

Regression tests pin each new gate:

  - test_non_string_composer_id_raises
  - test_non_string_bubble_id_raises
  - test_non_string_workspace_id_raises
  - test_non_numeric_created_at_raises
  - test_numeric_created_at_passes

209 tests pass (was 204; +5 new).
Copy link
Copy Markdown
Collaborator

@bradjin8 bradjin8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False Positives / Correctly Handled

  • WorkspaceLocalComposer vs Composer split — Correct design. Per-workspace rows lack fullConversationHeadersOnly / createdAt; using Composer.from_dict there would reject every real record. CodeRabbit's round-2 suggestion to use Composer.from_dict(c) was rightly rejected and replaced with WorkspaceLocalComposer.
  • createdAt required on Composer — Aligned with issue #24 after the round-1 fix; fixtures updated; test_missing_created_at_raises replaces the old tolerant test.
  • CodeRabbit rounds 1–3 — All actionable findings appear addressed in follow-up commits (non-dict guards, dropped or [] / str() coercion, broadened CLI catch, per-row SchemaError routing, ID type gates, createdAt numeric gate).
  • SchemaError(ValueError) — Preserves compatibility with existing except ValueError sites; tested in test_schema_error_inherits_value_error.

Comment thread api/workspaces.py
Comment thread api/composers.py
Comment thread models/errors.py Outdated
Comment thread api/composers.py Outdated
Comment thread models/conversation.py
@bradjin8
Copy link
Copy Markdown
Collaborator

Bubble, Workspace, ExportEntry — defined and tested but not wired at any production read site**

The PR table claims:

Model Claimed site Actual
Bubble bubbleId:* rows api/workspaces.py:584–586, api/search.py:150–153 still use json.loads + isinstance(b, dict) only
Workspace workspace.json api/composers.py:44–45, api/workspaces.py still use _read_json_file / json.load + .get("folder")
ExportEntry manifest.jsonl scripts/export.py:65–68 still uses entry.get("log_id") with silent skip

All three models have thorough unit tests in tests/test_models.py but zero production callers. The package is only half-delivered relative to the PR's own "Used at" table.

Closes five blocking findings from Brad's review pass:

1. Bubble / Workspace / ExportEntry were defined and unit-tested but had
   zero production callers — the PR's own "Used at" table didn't match
   the code. Wired each at the cited sites:
     * Bubble in api/workspaces.py (both bubble loaders, line 584 + 1023)
     * Bubble in api/search.py:150
     * Workspace in api/composers.py:44 and api/workspaces.py:54
     * ExportEntry reader in scripts/export.py:66 — paired with writer
       updates so the on-disk shape matches what the model accepts.

2. get_workspace_tabs() in api/workspaces.py bypassed Composer.from_dict
   while list_workspaces() in the same file validated. The two primary
   conversation-browsing paths now agree on what counts as a valid row.

3. get_composer() returned the raw dict from either per-workspace DB or
   global storage with no validation. Now goes through
   WorkspaceLocalComposer.from_dict (per-workspace, drift logged + skipped)
   and Composer.from_dict (global fallback, drift -> 404 + log).

4. WorkspaceLocalComposer.from_dict result was discarded after the gate.
   The typed object is now load-bearing: the sort key reads
   local.last_updated_at and the JSON's composerId/lastUpdatedAt are
   written from the validated fields, not the raw dict.

5. SchemaError message always read "missing required field" even on
   type/shape mismatches. Now reads "invalid field" when hint is set
   and "missing required field" only when hint is None, so log grepping
   can tell the two drift modes apart.

Plus one forward-looking note: added a WHY comment at the numeric-only
createdAt check in models/conversation.py pointing at the 17/17 live
scan that justifies the strict isinstance(int, float) gate.

11 new regression tests:
  - tests/test_models_wired_at_read_sites.py (new file, 10 tests)
    pinning each from_dict call site with spy + side_effect patches so
    a future refactor that drops the typed model fails loudly.
  - tests/test_models.py
    one new test asserting the SchemaError wording distinction holds.

Verified locally:
  - python -m unittest discover tests: 220 passed, OK (was 209)
  - python -m unittest tests.test_models_wired_at_read_sites: 10/10 pass
  - python app.py boots clean, curl / returns 200
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/composers.py`:
- Around line 125-139: The code in get_composer() currently assumes
json.loads(row[0]) yields a dict with a list-valued "allComposers"; instead
mirror the composer.composerData envelope guards by validating the decoded
payload: after data = json.loads(...), check isinstance(data, dict) and that
data.get("allComposers") is a list; if those checks fail, raise a ValueError (or
another exception that escapes the current silent pass) with a clear message so
schema drift is surfaced (this touches the get_composer() path and the
WorkspaceLocalComposer.from_dict usage).

In `@api/workspaces.py`:
- Around line 583-589: The catch block around Bubble.from_dict that populates
bubble_map currently swallows SchemaError and JSON errors silently; change it to
split exception handling so SchemaError is caught separately and logged
(including bubble id bid and any identifying row info) instead of being ignored,
while keeping JSONDecodeError/ValueError handling for malformed rows as before;
apply the same split-and-log pattern to both bubble-loading loops that use
Bubble.from_dict and bubble_map so schema drift gets an operator-visible log
entry rather than being dropped silently.

In `@tests/test_models_wired_at_read_sites.py`:
- Around line 338-343: Add a scripts/__init__.py file so the scripts directory
is a proper Python package and the test import "from scripts import export as
export_mod" resolves consistently for type checking; this is needed so symbols
referenced in the test (export_mod, export_mod.ExportEntry.from_dict, and
export_mod._load_manifest_entries) can be properly imported by mypy/CI. Create
an empty or minimal __init__.py (no runtime changes) in the scripts package,
commit it, and re-run the tests/typecheck to confirm the import errors are
resolved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88de2a2e-45f0-4e7c-a83c-d99c92278253

📥 Commits

Reviewing files that changed from the base of the PR and between 59ae103 and 16f861d.

📒 Files selected for processing (8)
  • api/composers.py
  • api/search.py
  • api/workspaces.py
  • models/conversation.py
  • models/errors.py
  • scripts/export.py
  • tests/test_models.py
  • tests/test_models_wired_at_read_sites.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • models/errors.py
  • models/conversation.py

Comment thread api/composers.py
Comment thread api/workspaces.py
Comment thread tests/test_models_wired_at_read_sites.py
CodeRabbit follow-ups to the round-2 wiring pass:

1. The three bubble-loading loops (api/workspaces.py:583, 1025;
   api/search.py:149) caught SchemaError, json.JSONDecodeError and
   ValueError together and dropped the row silently. That broke the
   read-boundary contract — a drifted bubble would simply vanish with
   no operator signal. Split the catches: SchemaError now logs
   `Schema drift in bubble <bid>: <reason>`; JSONDecodeError /
   ValueError still pass silently because that's a parser-noise issue,
   not a schema-contract issue.

2. get_composer() at api/composers.py:125 trusted that the per-workspace
   blob was a dict with a list-valued allComposers. A drifted local row
   like `[]` or `"…"` raised AttributeError on data.get(...) and turned
   schema drift into a 500. Mirrored the three envelope guards
   list_composers() already applies at line 60–74 (data is dict,
   allComposers present, allComposers is list) and wrapped the outer
   try with `except SchemaError` so drift logs the db_path and falls
   through to the global fallback instead of 500'ing.

3. scripts/ lacked __init__.py while every other top-level package
   directory (api/, models/, utils/, tests/) has one. The new
   `from scripts import export` calls in tests resolve at runtime via
   the sys.path insert in the test header, but mypy was reporting
   inconsistent module resolution on scripts.export — a CI typecheck
   risk. Added an empty scripts/__init__.py to match convention.

3 new regression tests in tests/test_models_wired_at_read_sites.py:
  - test_bubble_schema_drift_is_logged_not_swallowed_silently —
    seeds a malformed bubble, captures stdout, asserts the
    `Schema drift in bubble <bid>` line is present.
  - test_get_composer_handles_non_dict_envelope_via_fallback —
    replaces composer.composerData with a list payload, asserts
    the endpoint returns 200 via the global fallback (was 500
    before the envelope guards).
  - test_get_composer_handles_non_list_all_composers_via_fallback —
    same shape for non-list allComposers.

Verified locally:
  - python -m unittest discover tests: 223 passed (was 220)
  - python -m unittest tests.test_models_wired_at_read_sites: 13/13
  - mypy resolves scripts.export consistently
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/search.py (1)

55-55: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add type annotation for combined to fix mypy error.

The pipeline is failing because mypy requires a type annotation for the combined list.

Proposed fix
-    combined = []
+    combined: list[str] = []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/search.py` at line 55, Add an explicit type annotation for the local
variable `combined` in api/search.py (the list assigned with `combined = []`) to
satisfy mypy; declare it as a list of the correct element type (e.g.,
List[YourElementType] or List[Any] if the element type is heterogeneous), and
add the necessary import from typing (List or Any) at the top of the module;
keep the variable name `combined` and the existing assignment but change it to
include the type annotation so mypy no longer requires an implicit type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@api/search.py`:
- Line 55: Add an explicit type annotation for the local variable `combined` in
api/search.py (the list assigned with `combined = []`) to satisfy mypy; declare
it as a list of the correct element type (e.g., List[YourElementType] or
List[Any] if the element type is heterogeneous), and add the necessary import
from typing (List or Any) at the top of the module; keep the variable name
`combined` and the existing assignment but change it to include the type
annotation so mypy no longer requires an implicit type.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a5f9d84f-b61b-4b67-b4eb-ad75835e0705

📥 Commits

Reviewing files that changed from the base of the PR and between 16f861d and b9f34a6.

📒 Files selected for processing (5)
  • api/composers.py
  • api/search.py
  • api/workspaces.py
  • scripts/__init__.py
  • tests/test_models_wired_at_read_sites.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/test_models_wired_at_read_sites.py
  • api/workspaces.py

@timon0305 timon0305 requested a review from bradjin8 May 14, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data-access layer: typed models + schema validation foundation

2 participants