Skip to content

feat: Add unique rule to dy.Column#325

Open
gab23r wants to merge 2 commits intoQuantco:mainfrom
gab23r:add_is_unique
Open

feat: Add unique rule to dy.Column#325
gab23r wants to merge 2 commits intoQuantco:mainfrom
gab23r:add_is_unique

Conversation

@gab23r
Copy link
Copy Markdown
Contributor

@gab23r gab23r commented Apr 13, 2026

Motivation

Closes #313

Changes

Add the new rule using the same logic than primary_keys.

Drive by:

  • Allow primary_keys for array dtype as it now works in polars, I added a test of it.
  • Disallow primary_keys for object dtype as it never worked. (technically breaking but IMHO shouldn't break anything)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a per-column unique constraint to dy.Column/dy.Schema, aligning validation and SQLAlchemy output, and updates array/object primary key behavior.

Changes:

  • Introduce unique as a first-class column attribute and emit unique=True in SQLAlchemy column definitions.
  • Add schema validation rules for unique columns (and tighten primary key validation via is_unique()).
  • Expand tests for unique constraints and enable primary keys on Array columns while disallowing them for Object columns.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/schema/test_validate.py Adds validation tests for unique columns and Schema.unique_columns().
tests/schema/test_sample.py Adds sampling tests ensuring generated data respects unique constraints.
tests/column_types/test_array.py Adds coverage for primary keys on Array columns.
dataframely/columns/_base.py Adds unique attribute to Column and passes it to SQLAlchemy columns.
dataframely/_base_schema.py Adds unique rules into schema validation; uses is_unique() for primary keys.
dataframely/columns/array.py Allows primary_key on arrays and threads through unique.
dataframely/columns/string.py Threads unique through to Column.
dataframely/columns/integer.py Threads unique through to Column.
dataframely/columns/float.py Threads unique through to Column.
dataframely/columns/decimal.py Threads unique through to Column.
dataframely/columns/datetime.py Threads unique through to Column for date/time/datetime/timedelta.
dataframely/columns/enum.py Threads unique through to Column.
dataframely/columns/categorical.py Threads unique through to Column.
dataframely/columns/list.py Threads unique through to Column.
dataframely/columns/struct.py Threads unique through to Column.
dataframely/columns/object.py Removes primary_key kwarg from Object column constructor.

Comment thread tests/schema/test_validate.py Outdated
Comment on lines 22 to 26
self,
*,
nullable: bool = True,
primary_key: bool = False,
check: Check | None = None,
alias: str | None = None,
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the primary_key keyword argument from Object.__init__ is a breaking API change (existing callers using dy.Object(primary_key=...) will now error at call-time with an unexpected kwarg). If the goal is to disallow object primary keys, consider keeping the parameter and raising a clear ValueError when primary_key=True (optionally with a deprecation path), so callers get a more actionable error while minimizing breakage.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason to change this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is more a pass-by changes, I though about NOT adding is_unique for object as it is not allowed by polars and then I saw the primary_key argument here... It shouldn't be here, as primary_key for object dtype are not possible.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I was not aware of the polars behavior here. TLDR:

  1. You can call df.unique() on a dataframe with an object column and it will work as expected iff your objects are hashable
  2. You cannot call df.select(pl.col("mycol").is_unique()), which errors saying that uniqueness checks are not supported for object columns.

In any case, passing primary_key=True to dy.Object leads to failing validation no matter what, so I think it's reasonable to remove it here.

Comment on lines 22 to 27
self,
*,
nullable: bool = True,
primary_key: bool = False,
check: Check | None = None,
alias: str | None = None,
metadata: dict[str, Any] | None = None,
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other column types in this PR accept unique: bool = False, but Object now accepts neither primary_key nor unique, making the API inconsistent. If unique is unsupported for object dtype, it would be helpful to make that explicit (e.g., accept unique and raise a targeted ValueError when True, or document why it’s omitted). If it is supported, add unique: bool = False and pass it through to super().__init__(unique=unique, ...).

Copilot uses AI. Check for mistakes.
Comment thread dataframely/_base_schema.py
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (7c73bb1) to head (40a5366).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #325   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           56        56           
  Lines         3399      3408    +9     
=========================================
+ Hits          3399      3408    +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks gab23r ! This looks quite nice to me already. Small suggestions.

Comment on lines 22 to 26
self,
*,
nullable: bool = True,
primary_key: bool = False,
check: Check | None = None,
alias: str | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason to change this?

Comment on lines +274 to +280
@pytest.mark.parametrize("n", [0, 100])
def test_sample_unique_constraint(n: int) -> None:
df = UniqueSchema.sample(n, generator=Generator(seed=42))
assert len(df) == n
UniqueSchema.validate(df)
# Verify uniqueness
assert df.get_column("email").n_unique() == n
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test would be more meaningful if we used a smaller dtype here, e.g. UInt8. With a string, I think your chance of sampling two identical strings is very small either way, it's unclear if the unique constraint worked.



class MultiUniqueSchema(dy.Schema):
a = dy.Int64(unique=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@borchero Oliver Borchert (borchero) changed the title feat: Add is_unique rule to dy.Column feat: Add unique rule to dy.Column Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: unique=True column constraint

3 participants