Skip to content

Migrate to earthkit v1.0 release candidate#162

Open
frazane wants to merge 25 commits into
mainfrom
feat/earthkit-v1-migration
Open

Migrate to earthkit v1.0 release candidate#162
frazane wants to merge 25 commits into
mainfrom
feat/earthkit-v1-migration

Conversation

@frazane
Copy link
Copy Markdown
Contributor

@frazane frazane commented May 26, 2026

Migrates evalml to the first earthkit v1.0 release candidate. This meant moving data loading off meteodata-lab and onto earthkit-data, adapting to a GRIB encoding change in the new eccodes, and aligning coordinate and dimension names with what the new stack produces. Follow up PRs will do a more thorough refactoring of the data I/O and processing code - this is only to support earthkit 1.0.

What changed

  • Updated dependencies in pyproject.toml and uv.lock: removed meteodata-lab and the old earthkit-plots, pinned the earthkit family (earthkit-data, earthkit-utils, earthkit-plots, earthkit-meteo, earthkit-geo) to release candidate versions, and bumped eccodes and eccodes-cosmo-resources-python. Also moved snakefmt to v2.0.
  • Reworked GRIB reading in src/data_input on top of earthkit-data, replacing the meteodata-lab decoder. Includes fieldlist-to-xarray conversion driven by an xarray engine profile and de-accumulation handling for TOT_PREC.
  • Standardized coordinate and dimension names across the codebase: latitude/longitude (previously lat/lon), step (previously lead_time), and valid_time (previously time). This touches data loading, spatial mapping in verification, plotting, the plot scripts, and the related tests.
  • Adapted plotting to the new earthkit: the GRIB compatibility loader now goes through the shared data_input loader, and unit conversions and styles use earthkit-meteo and earthkit-plots.
  • Fixed inference configs after a breaking change in the new eccodes: with the COSMO definitions active we can no longer encode both ICON and IFS GRIB files via shortName alone, so a modifiers/patches section now maps each variable to its paramId and shortName in the global inference configs.
  • Removed legacy regional and trimedge inference configs that are no longer used.
  • Added a README section on the migration, including a manual workaround to download and cache the eckit geo grid files for the ICON-CH grids, which earthkit cannot fetch automatically yet.

Notes

  • earthkit v1.0rc is not final, so some rough edges remain. The ICON-CH grid cache step in the README is a manual workaround until the upstream download is fixed.
  • The grib file globbing in plot_forecast_frame.py is a temporary fix (marked with a TODO) for anemoi-inference writing output filenames with unexpected formatting.
  • The plotting and data loading code urgently needs some care. In this PR I focused on the minimal changes to make the code work with the new earthkit, but we need to do a larger refactor.

@frazane frazane marked this pull request as ready for review May 26, 2026 15:55
@frazane
Copy link
Copy Markdown
Contributor Author

frazane commented May 26, 2026

Note: we might be able to get rid of the paramId patches in the inference configs. I am working on something here: ecmwf/anemoi-inference@main...feature/use-grib-paramid-encoding. Even if it works it might take some time to merge it because there are likely unwanted side-effects on ECMWF side.

@frazane
Copy link
Copy Markdown
Contributor Author

frazane commented May 27, 2026

Tested evalml experiment ... on all example configs. All green.

@frazane frazane requested review from clairemerker and dnerini May 27, 2026 09:43
@frazane
Copy link
Copy Markdown
Contributor Author

frazane commented May 27, 2026

Still working on the showcase command.

@frazane frazane requested a review from jonasbhend May 28, 2026 08:56
@frazane frazane mentioned this pull request May 28, 2026
frazane added 2 commits May 28, 2026 14:05
Resolve conflicts:
- inference_extract_requirements.py: keep branch's newer eccodes pins
  (eccodes>=2.44.0,<2.48.0 / eccodes-cosmo-resources-python==2.44.0.1).
- plot_forecast_frame.py / plot_meteogram.py: main refactored these from
  marimo notebooks (.mo.py) into plain scripts with new CLI (regions_json/
  stations/outdir). Took main's structure and re-applied the branch's
  earthkit-v1 data-model renames: step (not lead_time), valid_time (not time)
  for forecasts/baselines, latitude/longitude (not lat/lon) station coords,
  plus the grib-glob workaround for forecast frames.
@frazane
Copy link
Copy Markdown
Contributor Author

frazane commented May 28, 2026

showcase command is working

Copy link
Copy Markdown
Contributor

@jonasbhend jonasbhend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @frazane for putting this together. Looking good. I have a few comments and need to run some examples to make sure everything works. Will approve once I have checked these.

Comment thread config/forecasters-ich1-oper-fixed.yaml Outdated
Comment thread resources/inference/configs/sgm-forecaster-global-disentangled.yaml
Comment thread resources/inference/configs/sgm-forecaster-global-disentangled.yaml Outdated
Comment thread resources/inference/configs/sgm-forecaster-global-ich1-oper.yaml Outdated
Comment thread resources/inference/configs/sgm-forecaster-global-ich1-oper.yaml
Comment thread resources/inference/configs/sgm-forecaster-global-disentangled.yaml
Comment thread src/plotting/compat.py Outdated
Comment thread src/plotting/compat.py
Comment on lines +60 to +74
mask = ~np.isnan(ds[_paramlist_ecmwf[0]].values.squeeze())
global_lons = ds["longitude"].values.flatten()
if np.max(global_lons) > 180:
global_lons = ((global_lons + 180) % 360) - 180
state["longitudes"] = np.concatenate([state["longitudes"], global_lons[mask]])
state["latitudes"] = np.concatenate(
[state["latitudes"], ds["latitude"].values.flatten()[mask]]
)
for param in _paramlist_ecmwf:
if param in ds:
state["fields"][PARAMS_MAP_INV[param]] = np.concatenate(
[
state["fields"][PARAMS_MAP_INV[param]],
ds[param].values.flatten()[mask],
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is 'expanding' the global fields to avoid seams in the plots, right? Could we maybe factor this out in a separate function to clarify what this is for?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I didn't quite catch all the lines I guess....

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is reproducing what the cutout operation from anemoi does: concatenates regional data with global data, where global data has a region of missing values (the mask) that is replaced by the regional data. Since this code will likely disappears in the future refactoring, would it be okay to leave it as is for now?

Comment thread workflow/scripts/plot_forecast_frame.py Outdated
Comment thread workflow/scripts/verification_metrics.py
@jonasbhend
Copy link
Copy Markdown
Contributor

Ok the first issue I stumble upon is that we now have conflicting pins and requirements in the inference environment:

Using Python 3.12.12 environment at: output/data/runs/interpolator-tmp-569a-on-forecaster-c304-0ee3/.venv
  × No solution found when resolving dependencies:
  ╰─▶ Because you require eccodes==2.39.1 and eccodes>=2.44.0,<2.48.0, we can conclude that your requirements are unsatisfiable.

(using interpolators-ich1.yaml from the branch)... I guess we need to adjust the example configs accordingly @frazane, right?

@frazane
Copy link
Copy Markdown
Contributor Author

frazane commented May 28, 2026

Ok the first issue I stumble upon is that we now have conflicting pins and requirements in the inference environment:

Using Python 3.12.12 environment at: output/data/runs/interpolator-tmp-569a-on-forecaster-c304-0ee3/.venv
  × No solution found when resolving dependencies:
  ╰─▶ Because you require eccodes==2.39.1 and eccodes>=2.44.0,<2.48.0, we can conclude that your requirements are unsatisfiable.

(using interpolators-ich1.yaml from the branch)... I guess we need to adjust the example configs accordingly @frazane, right?

Ah yes I didn't see this, it got into the branch during the merge with main 4a211c2. We don't need those pins anymore.

@frazane frazane requested review from cosunae and jonasbhend May 29, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants