Skip to content

refactor: bring in types from zarr-metadata#3961

Open
d-v-b wants to merge 40 commits into
zarr-developers:mainfrom
d-v-b:use-zarr-metadata
Open

refactor: bring in types from zarr-metadata#3961
d-v-b wants to merge 40 commits into
zarr-developers:mainfrom
d-v-b:use-zarr-metadata

Conversation

@d-v-b
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b commented May 10, 2026

replaces some of our types with exports from zarr-metadata. I expect a few related PRs, alternating between ones like this (importing types) and ones that add missing types to zarr-metadata.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label May 10, 2026
@d-v-b d-v-b requested a review from ilan-gold May 10, 2026 08:38
@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented May 10, 2026

cc @chuckwondo

@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.38%. Comparing base (5ca1690) to head (5ae442e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3961      +/-   ##
==========================================
- Coverage   93.39%   93.38%   -0.02%     
==========================================
  Files          88       88              
  Lines       11839    11828      -11     
==========================================
- Hits        11057    11045      -12     
- Misses        782      783       +1     
Files with missing lines Coverage Δ
src/zarr/codecs/blosc.py 95.74% <100.00%> (-0.03%) ⬇️
src/zarr/codecs/cast_value.py 98.58% <100.00%> (-0.02%) ⬇️
src/zarr/core/dtype/npy/structured.py 92.61% <100.00%> (ø)
src/zarr/core/group.py 94.99% <100.00%> (+0.01%) ⬆️
src/zarr/core/metadata/v2.py 88.95% <100.00%> (-0.07%) ⬇️
src/zarr/core/metadata/v3.py 93.69% <100.00%> (-0.17%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread pyproject.toml Outdated
@d-v-b d-v-b mentioned this pull request May 11, 2026
d-v-b added 5 commits May 11, 2026 16:39
- set `zarr-metadata` to resolve locally in local development
- add a section to the docs outlining the relationship between zarr and zarr-metadata packages
…e the lower zarr-metadata bound is published
@github-actions github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label May 13, 2026
@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented May 15, 2026

we should not merge this until #3972 is sorted out. that PR switches our pre-commit mypy check to run in a locked environment, which can include the package-local copy of zarr-metadata. Without that PR, mypy in pre-commit would run against the version of zarr-metadata on pypi, which would generate false alarms when we pushed pre-release changes to zarr-metadata + zarr-python.

Comment thread packages/zarr-metadata/src/zarr_metadata/v3/codec/bytes.py
Comment thread packages/zarr-metadata/src/zarr_metadata/v3/codec/gzip.py Outdated
Comment thread src/zarr/core/group.py
Comment on lines +450 to +451
if self.zarr_format == 2:
result.pop("node_type", None)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node_type is not a valid field in v2 group metadata, so arguably this was a bug. but we also don't use to_dict for creating our v2 metadata, so while correct, this change doesn't fix any broken behavior.

Comment thread tests/test_array.py
d-v-b and others added 5 commits May 15, 2026 22:44
These constants were added to zarr-metadata in-tree on this branch
(commit 08334d4). zarr-python cannot consume them until they appear
in the zarr-metadata version pinned by the project floor — the
min_deps env caught this when it installed zarr-metadata==0.1.1 from
PyPI and pytest blew up on import.

Reverting the test-side adoption here. The constants will be split
out into a separate PR against main and consumed once released.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…data

After reverting a83537c from this branch, BytesCodecObject in
zarr-metadata 0.1.1 (the project floor) still requires `configuration`.
The bare `{"name": "bytes"}` form is correct per spec and at runtime,
but doesn't type-check against the strict 0.1.1 shape. Drop the
annotation here; the field-relaxation fix has been split out to a
separate PR against main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-v-b and others added 10 commits May 19, 2026 08:57
zarr-metadata 0.2.0 is published on PyPI and adds partial metadata
document types (ArrayMetadataV2Partial, ArrayMetadataV3Partial,
GroupMetadataV2Partial, GroupMetadataV3Partial). Raise the dependency
range from `>=0.1.1,<0.2` to `>=0.2.0,<0.3` so zarr-python can consume
them, and update the matching min_deps pin and contributing docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rtial

The three `array_metadata` fixtures in test_consolidated.py are partial v3
array documents: they intentionally omit `shape` and `chunk_grid`, which are
supplied per-array via spread into `ArrayV3Metadata.from_dict(...)`.
zarr-metadata 0.2.0 ships `ArrayMetadataV3Partial` (the `total=False` form)
to type exactly this kind of fragment, replacing the loose `dict[str, JSON]`
/ `dict[str, Any]` annotations.

The partial type widens field value types to `object`, so spreading it into
the `dict[str, JSON]` literal that `from_dict` consumes needs a per-spread
`# type: ignore[dict-item]`. That tradeoff is deliberate: the fragment values
are now precisely typed, and the suppression is localized to the spread sites
rather than mistyping the actual fixture.

Other survey candidates (full GroupMetadata docs in test_group.py,
extension-field dicts) were left unchanged: they are either complete
documents or extra-key dicts, not partial fragments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `zarr_json` fixture in test_getitem_consolidated_empty_leaf_group is a
complete v3 group metadata document carrying an inline consolidated_metadata
extension field. Retype it from `dict[str, JSON]` to `GroupMetadataV3` so the
standard fields (zarr_format, node_type, attributes) are structurally checked.

mypy does not honor PEP 728 `extra_items=`, so the conforming extension key
still needs a single `# type: ignore[typeddict-unknown-key]`, matching the
existing pattern elsewhere in the metadata tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Annotate the `valid_json_v3` fixtures in the dtype tests with the matching
zarr-metadata 0.2.0 types so the fixture shapes are structurally checked
(and dtype-name typos in the bare-string fixtures are caught):

- test_int.py: Int8/16/32/64 + Uint8/16/32/64 DataTypeName literals
- test_float.py: Float16/32/64 DataTypeName literals
- test_complex.py: Complex64/128 DataTypeName literals
- test_bool.py: BoolDataTypeName literal
- test_string.py: StringDataTypeName literal (variable-length classes)
- test_time.py: NumpyDatetime64 / NumpyTimedelta64 envelope TypedDicts

Left deliberately untyped (documented inline):
- struct: zarr-metadata's `Struct` models `fields` as a tuple, but
  zarr-python's `Struct.to_json` emits a list and the round-trip test asserts
  equality, so typing it would break runtime behavior.
- fixed_length_utf32 / null_terminated_bytes / raw_bytes / variable_length_bytes:
  zarr-metadata 0.2.0 exports no envelope type for these dtypes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Struct.to_json(zarr_format=3)` built `configuration.fields` with a list
comprehension, producing a Python list. Nothing requires a list: `json.dumps`
serializes list and tuple identically, the `_from_json` parsers iterate the
field array structurally, and no test asserts list-ness for the canonical
`Struct` class. The list was incidental.

Emit a tuple instead, matching zarr-metadata's `StructConfiguration.fields`
type (`tuple[StructField, ...]`) and the project's principle that a JSON array
is a typed fixed-length container. This lets the v3 round-trip test fixture be
typed as zarr-metadata's `Struct` — the one dtype fixture that previously had
to stay loosely typed because the tuple model collided with the list emission.

The on-disk JSON is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Slow Hypothesis job checked out shallowly (no tags), so hatch-vcs could
not find the `zarr_metadata-v*` / `v*` tags and built the in-tree
zarr-metadata as `0.1.dev1`. After bumping zarr-python's floor to
`zarr-metadata>=0.2.0`, that stale version no longer satisfies the
constraint, producing a ResolutionImpossible at dependency-sync time:

    Cannot install zarr-metadata 0.1.dev1 ... and zarr==0.1.dev1 because
    these package versions have conflicting dependencies.

Add `fetch-depth: 0` so the workflow grabs all tags, matching test.yml and
the other package-building workflows. hatch-vcs then derives real versions
(zarr 3.x, zarr-metadata 0.2.x) and the floor resolves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two low-severity findings from the branch review:

- releases.yml: the `verify_pypi_dependency` failure diagnostics wrapped `>=`
  in backticks inside double-quoted echo strings, so bash ran `>=` as command
  substitution — dropping `>=` from the message and creating a stray `=` file.
  Switch to single quotes with the requirement passed as a separate argument.

- core/metadata/v2.py: the "Re-export ... historical name" comment sat above
  an unrelated `parse_separator` import. Move it to the
  `ArrayV2MetadataDict = _ArrayMetadataV2` assignment it actually describes.

The third finding (pyright via uvx possibly not resolving src imports) was
verified as a non-issue: the zarr-metadata pyright CI job passes on the
current branch tip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented May 22, 2026

this is ready for final review!

key notes for reviewers:

  • nothing should change at runtime
  • some tests have changed because they used invalid metadata
  • the struct dtype used lists instead of tuples for its JSON metadata as declared in zarr-python. That type is changed to use tuples, following our broader metadata convention. I do not think this will affect any consumers.
  • There's a fair number of infrastructure / build changes, which try to ensure that we don't ship a version of zarr that depends on unreleased features in zarr-metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants