Fix LookML filter-expression to SQL conversion#241
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 345f8c29cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # before / after <value> (date or numeric bound) | ||
| bm = re.match(r"(?i)^(before|after)\s+(.+)$", v) | ||
| if bm: | ||
| op = "<" if bm.group(1).lower() == "before" else ">" |
There was a problem hiding this comment.
Preserve inclusive after-date filters
For LookML date filters, after 2018-10-05 includes the boundary date per Looker's filter docs (https://docs.cloud.google.com/looker/docs/filter-expressions#date_and_time), but this branch emits > for every after expression. Any imported measure using filters: [created_date: "after 2020-01-01"] will drop rows from 2020-01-01; after should be generated as >= while before remains exclusive.
Useful? React with 👍 / 👎.
| for p in parts: | ||
| cond = single(p) | ||
| is_exclude = (p.startswith("-") and not re.match(r"^-(\d|\.)", p)) or bool(re.match(r"(?i)^not\s", p)) | ||
| (excludes if is_exclude else includes).append(cond) |
There was a problem hiding this comment.
Apply leading NOT to the whole numeric list
Looker's numeric filter syntax treats NOT 66, 99, 4 as excluding all three values (https://docs.cloud.google.com/looker/docs/filter-expressions#number), but this mixed-list path only marks the first NOT 66 token as an exclusion and ORs the remaining plain values as inclusions, producing (f = 99 OR f = 4) AND f != 66. Measures imported from that documented syntax will count only 99/4 instead of everything except 66/99/4; detect a leading NOT with no other NOTs and emit a negated list/range for all parts.
Useful? React with 👍 / 👎.
The LookML adapter treated Looker filter values as opaque strings, so anything beyond simple comparisons produced wrong or invalid SQL: - date/numeric ranges and intervals ([1,10], "5 to 10") became string equality or malformed IN lists (matched zero rows) - "not X" / negation -> literal "= 'not X'" - EMPTY -> omitted the NULL case - single quotes were not escaped (broken SQL / injection risk) Rewrite _convert_lookml_filter_to_sql to implement the representable parts of Looker's filter expression language (numeric ranges, interval brackets, NOT/negation, EMPTY = NULL OR '', wildcard NOT LIKE, mixed lists) and escape quoted string literals. Untranslated date/interval expressions now log a warning instead of silently emitting a zero-row equality.
345f8c2 to
00ff35d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00ff35dc91
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| op = "<" if bm.group(1).lower() == "before" else ">=" | ||
| operand = bm.group(2).strip() | ||
| rhs = operand if is_number(operand) else q(operand) | ||
| return f"{col} {op} {rhs}" |
There was a problem hiding this comment.
Guard relative before/after date filters
When a LookML date filter uses documented relative bounds such as before 3 days ago or after Monday, this branch treats the operand as a literal SQL value before the date-expression warning can run, producing conditions like {model}.created_date < '3 days ago'. On dialects that don't parse those English phrases as dates, imported measures either fail or count the wrong rows; detect relative operands and leave them on the warning/fallback path instead of translating them as absolute bounds.
Useful? React with 👍 / 👎.
| cm = re.match(r"^(>=|<=|!=|<>|>|<)\s*(.+)$", v) | ||
| if cm: | ||
| operator, operand = cm.group(1), cm.group(2).strip() |
There was a problem hiding this comment.
Parse numeric AND ranges before single comparisons
When a numeric filter uses Looker's documented AND range syntax inside one condition, for example >1 AND <100, NOT 2, this regex captures 1 AND <100 as the operand for a single > comparison and quotes it because it isn't a number. The mixed-list path then emits a number-to-string comparison instead of field > 1 AND field < 100, so imported filtered measures fail or return nonsense for that supported syntax; split/parse AND subconditions before applying the single-comparison fallback.
Useful? React with 👍 / 👎.
Summary
Part of a series fixing correctness bugs in the LookML/Looker import adapter (
sidemantic/adapters/lookml.py), found by a deep audit._convert_lookml_filter_to_sqltreated Looker filter values as opaque strings, so anything past simple comparisons produced wrong or invalid SQL:last 7 daysf = 'last 7 days'(0 rows)5 to 10f = '5 to 10'f >= 5 AND f <= 10[1,10]f IN ('[1','10]')f >= 1 AND f <= 10NOT 5f = 'NOT 5'f != 5EMPTYf = ''(f IS NULL OR f = '')-%foo%f != '%foo%'f NOT LIKE '%foo%'O'Brienf = 'O'Brien'(broken / injection)f = 'O''Brien'Changes
a to b, open-ended), interval brackets[](),NOT/-negation,EMPTY = NULL OR '', wildcardNOT LIKE, and mixed comma lists (includes OR'd, excludes AND'd).Notes
this month, etc.) is deferred — it needs field-type + dialect handling.