Skip to content

feat(cdc): native SQLite CDC source (db.cdc.sqlite)#351

Open
skhaz wants to merge 2 commits into
mainfrom
feat/sqlite-cdc
Open

feat(cdc): native SQLite CDC source (db.cdc.sqlite)#351
skhaz wants to merge 2 commits into
mainfrom
feat/sqlite-cdc

Conversation

@skhaz

@skhaz skhaz commented Jun 16, 2026

Copy link
Copy Markdown
Member

Why

The runtime streams Postgres row changes via db.cdc.postgres (#323) but had no SQLite equivalent. SQLite has no logical-replication slot, so the native mechanism is the preupdate hook, which observes row changes on the connection the runtime writes through.

What

Adds a db.cdc.sqlite registry kind backed by a supervised Source that installs SQLite preupdate + commit/rollback hooks on the target db.sql.sqlite pool's writer connection and emits insert/update/delete (plus a gap-free snapshot bootstrap) through the existing, engine-agnostic cdc Lua module (cdc.stream / list_sources / source).

  • service/sql: build-tagged hook seam (sqlite_preupdate_hook) registering a ConnectHook-enabled driver; the factory selects it transparently (1-line change).
  • service/cdc/sqlite: preupdate rows buffered per-transaction, flushed atomically on commit via a bounded handoff to a drain goroutine. Column names/affinity resolve over a dedicated read-only connection and checkpoints write through a separate plain-driver connection, so the writer-blocking commit hook can never deadlock against schema resolution or checkpoints. A laggard subscriber is closed loudly rather than allowed to stall the writer. Durable snapshot/offset state lives in wippy_cdc_offsets in the source DB.
  • api/service/cdc: db.cdc.sqlite kind, SQLiteConfig, and a composite inspector/streamer so the Lua module observes both engines.
  • boot: kind-specific listeners (db.cdc.postgres + db.cdc.sqlite) feed the composite.

Limitation (by design)

Capture is in-process and live-only: changes made while the runtime is down, or by an external process writing the file, are not captured (unlike a Postgres slot that replays). The checkpoint exists for snapshot-gating and idempotent dedupe, not replay. This is the trade for zero schema intrusion and lowest overhead.

Setup

Build with the new tag (already wired into the Makefile): make build-wippy-local. Without sqlite_preupdate_hook the source fails loudly instead of silently capturing nothing.

Registry usage:

- { name: db,  kind: db.sql.sqlite, file: /path/app.db, lifecycle: { auto_start: true } }
- { name: cdc, kind: db.cdc.sqlite, db_resource: app:db, lifecycle: { auto_start: true } }

Lua: local s = cdc.stream("app:cdc"); local c = s:channel():receive() -> { op, table, before, after, source, lsn }.

Testing

  • golangci-lint (tags race,sqlite_preupdate_hook): 0 issues.
  • Unit + integration (-race): decode/affinity, subscribers, manager, composite, config; and against a real WAL SQLite file: insert/update/delete, value fidelity (blob/text/NULL), rollback-discard, snapshot bootstrap, restart-keeps-checkpoint, table allowlist, and laggard-subscriber-does-not-stall-writes.
  • Adversarial agent code review run on the branch: 4 real bugs found and fixed (error-path hook leak -> writer deadlock; laggard subscriber stalling the writer; Stop drain dropping changes via cancelled ctx; mode=ro on WAL).
  • Mutation (gremlins): untagged core 90% efficacy (2 equivalent survivors); tagged ~79%.
  • Live binary run (full production path: registry -> boot -> manager -> supervisor -> Source.Start -> preupdate hook -> drain -> cdc.stream -> Lua channel):
INFO cdc.sqlite  sqlite cdc source started  {"id":"app:cdc","file":".../data.db","snapshot":false}
INFO op=insert table=users source=app:cdc lsn=1
INFO after.email=live@wippy.ai after.balance=42.5

Why:
The runtime could stream row changes from Postgres (db.cdc.postgres) but
had no equivalent for SQLite. SQLite has no logical-replication slot, so
the native mechanism is the preupdate hook, which observes row changes on
the connection the runtime writes through.

What:
Adds a db.cdc.sqlite registry kind backed by a supervised Source that
installs SQLite preupdate + commit/rollback hooks on the target
db.sql.sqlite pool's writer connection and emits insert/update/delete
(plus a gap-free snapshot bootstrap) through the existing,
engine-agnostic cdc Lua module (cdc.stream/list_sources/source).

How:
- service/sql: a build-tagged hook seam (sqlite_preupdate_hook) registers
  a ConnectHook-enabled driver and installs/clears hooks on a raw
  *sqlite3.SQLiteConn; the factory selects the driver transparently.
- service/cdc/sqlite: the Source buffers preupdate rows per transaction
  and flushes atomically on commit via a bounded handoff to a drain
  goroutine. Column names/affinity resolve over a dedicated read-only
  connection, and checkpoints write through a separate plain-driver
  connection, so the writer-blocking commit hook can never deadlock
  against schema resolution or checkpoint writes. A laggard subscriber is
  closed rather than allowed to stall the writer. Durable snapshot/offset
  state lives in wippy_cdc_offsets in the source DB.
- api/service/cdc: db.cdc.sqlite kind, SQLiteConfig, and a composite
  inspector/streamer so the Lua module observes both engines.
- boot: kind-specific listeners (db.cdc.postgres + db.cdc.sqlite) feed the
  composite.

Capture is in-process and live-only: changes made while the runtime is
down, or by an external writer, are not captured. The checkpoint exists
for snapshot-gating and idempotent dedupe, not replay. Building requires
the sqlite_preupdate_hook tag (added to the Makefile); without it the
source fails loudly instead of silently capturing nothing.
@skhaz skhaz requested a review from wolfy-j June 16, 2026 18:33
Why:
PR #351 bolted SQLite CDC specifics onto the generalized service/sql
driver (a build-tagged preupdate-hook file, a mutable driver-name global,
and CDC-only errors), and the package dispatched engines through kind
switches. Adding a database therefore meant editing core dispatch code,
and the generalized driver carried engine- and CDC-specific knowledge it
should not have.

What:
service/sql core now exposes only two public seams, RegisterEngine and
RegisterDriver, and dispatches purely via engineFor(kind); the manager,
factory, and ConnPool.UpdateConfig no longer switch on engine kind. An
EngineConfig contract lets the generic create/update lifecycle validate
and read lifecycle settings without knowing the concrete type. Built-in
engines move into self-registering sub-packages that use only the public
API: engine/standard (Postgres and MySQL, each with its own DSN builder,
removing the buildDSN/getDriver kind switch), engine/sqlite (DSN, WAL,
single-writer tuning), and engine/all (blank-imports the built-ins). Boot
blank-imports engine/all as the single wiring point. The database/sql
driver override is applied centrally in createPool, so engines stay
override-agnostic. All SQLite CDC hook code (custom sqlite3_wippy driver,
sink registry, preupdate scan, install/clear-on-raw, CDC errors) moves to
service/cdc/sqlite/hook.go and registers its driver through RegisterDriver,
leaving service/sql with zero CDC knowledge. Adding a database is now a new
self-registering package that touches no existing core file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant