Skip to content

Close non-philosophical Cube pre-aggregation gaps#240

Open
nicosuave wants to merge 1 commit into
mainfrom
claude/charming-brahmagupta-05865b
Open

Close non-philosophical Cube pre-aggregation gaps#240
nicosuave wants to merge 1 commit into
mainfrom
claude/charming-brahmagupta-05865b

Conversation

@nicosuave

Copy link
Copy Markdown
Member

Closes the gaps where Sidemantic's pre-aggregation support fell short of Cube — everything that doesn't require a philosophical change (no Cube Store, no managed refresh worker) or an unavailable dependency.

Implemented

Area Change
original_sql Materializes the cube's base query verbatim (no GROUP BY); matcher excludes it from metric routing
indexes Emits CREATE INDEX on refresh for DuckDB/Postgres; skipped for engines that manage layout via clustering/sort keys
Missing-rollup fallback When a routed rollup table isn't built, falls back to raw tables (correct results); opt-in preagg_strict / rollup-only mode raises PreaggregationStrictError instead
Partitioning build_partitions() materializes one table per time bucket + a covering view; per-partition incremental refresh honoring update_window
update_window Now defaults the incremental refresh lookback window
Ungrouped / drill-to-detail Serves raw rows from a primary-key-carrying rollup
Lambda (rollupLambda) rollups + union_with_source_data fields, Cube round-trip, and a batch-rollup ∪ fresh-source UNION at query time
Validation Warns on config that remains inert (rollup_join/lambda routing, refresh_key.sql, partition build_range, count_distinct_approx degraded to exact)

Lambda-only fields are excluded from the sidemantic-rs YAML serialization (its schema rejects unknown fields) in both the production bridge and the test adapter, and the new Python-only routing tests are registered in the Rust parity gaps.

Deliberately not included (need a dependency, not just code)

  • Approximate distinct / HLL — DuckDB has no mergeable sketch, so it can't be executed or tested on the default engine; it would ship untested dialect-specific SQL and needs a new agg-type concept. Left as a scoped follow-up rather than faked.
  • Multi-tenant pre-aggregations — needs a security-context concept the layer doesn't have.
  • Cube Store serving layer / managed refresh worker — philosophical: Sidemantic is stateless and warehouse-bound by design.

Note on test status

Four pre-existing test_symmetric_aggregates cases in the Rust-parity suite fail in this worktree because sidemantic_rs is not compiled here. They fail identically on main (verified by stashing to baseline) and are unrelated to this change — expected to pass in CI where the Rust extension is built.

Implement the pre-aggregation capabilities where Sidemantic fell short of Cube,
and make still-inert config visible via validation warnings.

- original_sql: materialize the cube's base query verbatim (no GROUP BY); the
  matcher excludes it from metric routing
- indexes: emit CREATE INDEX on refresh for DuckDB/Postgres; skip engines that
  manage layout via clustering/sort keys
- Fallback + strict: when a routed rollup table is missing, fall back to raw
  tables (correct results); opt-in preagg_strict / rollup-only mode raises
  PreaggregationStrictError instead
- Partitioning: build_partitions() materializes one table per time bucket plus a
  covering view, with per-partition incremental refresh honoring update_window
- refresh() defaults the incremental lookback window from refresh_key.update_window
- Ungrouped / drill-to-detail: serve raw rows from a primary-key-carrying rollup
- Lambda (rollupLambda): rollups + union_with_source_data fields, Cube round-trip,
  and a batch-rollup-union-fresh-source query at query time
- Validation warns on config that is still inert (rollup_join/lambda routing,
  refresh_key.sql, partition build_range, count_distinct_approx degraded to exact)

Exclude the lambda-only fields from the sidemantic-rs YAML serialization (its
schema rejects unknown fields) in both the production bridge and the test
adapter, and register the new Python-only routing tests in the Rust parity gaps.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d04c0c54c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +599 to +600
try:
return self.adapter.execute(sql)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route SQL queries through the missing-rollup fallback

This retry/fallback only wraps SemanticLayer.query(), but the documented CLI-first path (sidemantic query ... --use-preaggregations) calls layer.sql(sql), and SemanticLayer.sql() still executes the rewritten pre-aggregation SQL directly with no missing-relation retry. As a result, SQL/CLI users still get an error when a matching rollup table has not been built instead of falling back to raw tables, so the fallback needs to be shared with the SQL rewrite path as well.

Useful? React with 👍 / 👎.

Comment on lines +484 to +490
if self.partition_granularity:
if model is None:
raise ValueError(
f"Pre-aggregation '{self.name}' is partitioned (partition_granularity="
f"'{self.partition_granularity}'); pass model= to refresh() or call build_partitions() directly."
)
self.build_partitions(connection, model, lookback=lookback)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pass the model into partitioned refreshes

Any pre-aggregation with partition_granularity now enters this guard and requires model=, but the CLI refresh flow calls preagg_obj.refresh(...) without passing the model_obj. That makes sidemantic preagg refresh fail for partitioned rollups before it can build partitions, which leaves the new partitioning support unreachable through the primary CLI workflow.

Useful? React with 👍 / 👎.

Comment on lines +564 to +565
if not self.indexes or dialect not in _INDEX_DDL_DIALECTS:
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Infer DuckDB before skipping index creation

Index creation is gated on dialect, but the normal CLI refresh path only sets dialect for mode == "engine"; for supported DuckDB refreshes via --db, it passes None, so this check silently returns and declared indexes are never created. Since DuckDB is one of the supported index DDL dialects, the refresh path should pass or infer duckdb before this gate.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant