Close non-philosophical Cube pre-aggregation gaps by nicosuave · Pull Request #240 · sidequery/sidemantic

nicosuave · 2026-06-26T17:44:37Z

Closes the gaps where Sidemantic's pre-aggregation support fell short of Cube — everything that doesn't require a philosophical change (no Cube Store, no managed refresh worker) or an unavailable dependency.

Implemented

Area	Change
`original_sql`	Materializes the cube's base query verbatim (no GROUP BY); matcher excludes it from metric routing
`indexes`	Emits `CREATE INDEX` on refresh for DuckDB/Postgres; skipped for engines that manage layout via clustering/sort keys
Missing-rollup fallback	When a routed rollup table isn't built, falls back to raw tables (correct results); opt-in `preagg_strict` / rollup-only mode raises `PreaggregationStrictError` instead
Partitioning	`build_partitions()` materializes one table per time bucket + a covering view; per-partition incremental refresh honoring `update_window`
`update_window`	Now defaults the incremental refresh lookback window
Ungrouped / drill-to-detail	Serves raw rows from a primary-key-carrying rollup
Lambda (`rollupLambda`)	`rollups` + `union_with_source_data` fields, Cube round-trip, and a batch-rollup ∪ fresh-source UNION at query time
Validation	Warns on config that remains inert (`rollup_join`/`lambda` routing, `refresh_key.sql`, `partition` `build_range`, `count_distinct_approx` degraded to exact)

Lambda-only fields are excluded from the sidemantic-rs YAML serialization (its schema rejects unknown fields) in both the production bridge and the test adapter, and the new Python-only routing tests are registered in the Rust parity gaps.

Deliberately not included (need a dependency, not just code)

Approximate distinct / HLL — DuckDB has no mergeable sketch, so it can't be executed or tested on the default engine; it would ship untested dialect-specific SQL and needs a new agg-type concept. Left as a scoped follow-up rather than faked.
Multi-tenant pre-aggregations — needs a security-context concept the layer doesn't have.
Cube Store serving layer / managed refresh worker — philosophical: Sidemantic is stateless and warehouse-bound by design.

Note on test status

Four pre-existing test_symmetric_aggregates cases in the Rust-parity suite fail in this worktree because sidemantic_rs is not compiled here. They fail identically on main (verified by stashing to baseline) and are unrelated to this change — expected to pass in CI where the Rust extension is built.

Implement the pre-aggregation capabilities where Sidemantic fell short of Cube, and make still-inert config visible via validation warnings. - original_sql: materialize the cube's base query verbatim (no GROUP BY); the matcher excludes it from metric routing - indexes: emit CREATE INDEX on refresh for DuckDB/Postgres; skip engines that manage layout via clustering/sort keys - Fallback + strict: when a routed rollup table is missing, fall back to raw tables (correct results); opt-in preagg_strict / rollup-only mode raises PreaggregationStrictError instead - Partitioning: build_partitions() materializes one table per time bucket plus a covering view, with per-partition incremental refresh honoring update_window - refresh() defaults the incremental lookback window from refresh_key.update_window - Ungrouped / drill-to-detail: serve raw rows from a primary-key-carrying rollup - Lambda (rollupLambda): rollups + union_with_source_data fields, Cube round-trip, and a batch-rollup-union-fresh-source query at query time - Validation warns on config that is still inert (rollup_join/lambda routing, refresh_key.sql, partition build_range, count_distinct_approx degraded to exact) Exclude the lambda-only fields from the sidemantic-rs YAML serialization (its schema rejects unknown fields) in both the production bridge and the test adapter, and register the new Python-only routing tests in the Rust parity gaps.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d04c0c54c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-26T17:49:38Z

+        try:
+            return self.adapter.execute(sql)


Route SQL queries through the missing-rollup fallback

This retry/fallback only wraps SemanticLayer.query(), but the documented CLI-first path (sidemantic query ... --use-preaggregations) calls layer.sql(sql), and SemanticLayer.sql() still executes the rewritten pre-aggregation SQL directly with no missing-relation retry. As a result, SQL/CLI users still get an error when a matching rollup table has not been built instead of falling back to raw tables, so the fallback needs to be shared with the SQL rewrite path as well.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-26T17:49:38Z

+        if self.partition_granularity:
+            if model is None:
+                raise ValueError(
+                    f"Pre-aggregation '{self.name}' is partitioned (partition_granularity="
+                    f"'{self.partition_granularity}'); pass model= to refresh() or call build_partitions() directly."
+                )
+            self.build_partitions(connection, model, lookback=lookback)


Pass the model into partitioned refreshes

Any pre-aggregation with partition_granularity now enters this guard and requires model=, but the CLI refresh flow calls preagg_obj.refresh(...) without passing the model_obj. That makes sidemantic preagg refresh fail for partitioned rollups before it can build partitions, which leaves the new partitioning support unreachable through the primary CLI workflow.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-26T17:49:38Z

+        if not self.indexes or dialect not in _INDEX_DDL_DIALECTS:
+            return


Infer DuckDB before skipping index creation

Index creation is gated on dialect, but the normal CLI refresh path only sets dialect for mode == "engine"; for supported DuckDB refreshes via --db, it passes None, so this check silently returns and declared indexes are never created. Since DuckDB is one of the supported index DDL dialects, the refresh path should pass or infer duckdb before this gate.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Close non-philosophical Cube pre-aggregation gaps#240

Close non-philosophical Cube pre-aggregation gaps#240
nicosuave wants to merge 1 commit into
mainfrom
claude/charming-brahmagupta-05865b

nicosuave commented Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if not self.indexes or dialect not in _INDEX_DDL_DIALECTS:
		return

Uh oh!

Conversation

nicosuave commented Jun 26, 2026

Implemented

Deliberately not included (need a dependency, not just code)

Note on test status

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant