Close non-philosophical Cube pre-aggregation gaps#240
Conversation
Implement the pre-aggregation capabilities where Sidemantic fell short of Cube, and make still-inert config visible via validation warnings. - original_sql: materialize the cube's base query verbatim (no GROUP BY); the matcher excludes it from metric routing - indexes: emit CREATE INDEX on refresh for DuckDB/Postgres; skip engines that manage layout via clustering/sort keys - Fallback + strict: when a routed rollup table is missing, fall back to raw tables (correct results); opt-in preagg_strict / rollup-only mode raises PreaggregationStrictError instead - Partitioning: build_partitions() materializes one table per time bucket plus a covering view, with per-partition incremental refresh honoring update_window - refresh() defaults the incremental lookback window from refresh_key.update_window - Ungrouped / drill-to-detail: serve raw rows from a primary-key-carrying rollup - Lambda (rollupLambda): rollups + union_with_source_data fields, Cube round-trip, and a batch-rollup-union-fresh-source query at query time - Validation warns on config that is still inert (rollup_join/lambda routing, refresh_key.sql, partition build_range, count_distinct_approx degraded to exact) Exclude the lambda-only fields from the sidemantic-rs YAML serialization (its schema rejects unknown fields) in both the production bridge and the test adapter, and register the new Python-only routing tests in the Rust parity gaps.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9d04c0c54c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try: | ||
| return self.adapter.execute(sql) |
There was a problem hiding this comment.
Route SQL queries through the missing-rollup fallback
This retry/fallback only wraps SemanticLayer.query(), but the documented CLI-first path (sidemantic query ... --use-preaggregations) calls layer.sql(sql), and SemanticLayer.sql() still executes the rewritten pre-aggregation SQL directly with no missing-relation retry. As a result, SQL/CLI users still get an error when a matching rollup table has not been built instead of falling back to raw tables, so the fallback needs to be shared with the SQL rewrite path as well.
Useful? React with 👍 / 👎.
| if self.partition_granularity: | ||
| if model is None: | ||
| raise ValueError( | ||
| f"Pre-aggregation '{self.name}' is partitioned (partition_granularity=" | ||
| f"'{self.partition_granularity}'); pass model= to refresh() or call build_partitions() directly." | ||
| ) | ||
| self.build_partitions(connection, model, lookback=lookback) |
There was a problem hiding this comment.
Pass the model into partitioned refreshes
Any pre-aggregation with partition_granularity now enters this guard and requires model=, but the CLI refresh flow calls preagg_obj.refresh(...) without passing the model_obj. That makes sidemantic preagg refresh fail for partitioned rollups before it can build partitions, which leaves the new partitioning support unreachable through the primary CLI workflow.
Useful? React with 👍 / 👎.
| if not self.indexes or dialect not in _INDEX_DDL_DIALECTS: | ||
| return |
There was a problem hiding this comment.
Infer DuckDB before skipping index creation
Index creation is gated on dialect, but the normal CLI refresh path only sets dialect for mode == "engine"; for supported DuckDB refreshes via --db, it passes None, so this check silently returns and declared indexes are never created. Since DuckDB is one of the supported index DDL dialects, the refresh path should pass or infer duckdb before this gate.
Useful? React with 👍 / 👎.
Closes the gaps where Sidemantic's pre-aggregation support fell short of Cube — everything that doesn't require a philosophical change (no Cube Store, no managed refresh worker) or an unavailable dependency.
Implemented
original_sqlindexesCREATE INDEXon refresh for DuckDB/Postgres; skipped for engines that manage layout via clustering/sort keyspreagg_strict/ rollup-only mode raisesPreaggregationStrictErrorinsteadbuild_partitions()materializes one table per time bucket + a covering view; per-partition incremental refresh honoringupdate_windowupdate_windowrollupLambda)rollups+union_with_source_datafields, Cube round-trip, and a batch-rollup ∪ fresh-source UNION at query timerollup_join/lambdarouting,refresh_key.sql,partitionbuild_range,count_distinct_approxdegraded to exact)Lambda-only fields are excluded from the
sidemantic-rsYAML serialization (its schema rejects unknown fields) in both the production bridge and the test adapter, and the new Python-only routing tests are registered in the Rust parity gaps.Deliberately not included (need a dependency, not just code)
Note on test status
Four pre-existing
test_symmetric_aggregatescases in the Rust-parity suite fail in this worktree becausesidemantic_rsis not compiled here. They fail identically onmain(verified by stashing to baseline) and are unrelated to this change — expected to pass in CI where the Rust extension is built.