Skip to content

Integration of MEC workflow#110

Open
andreaspauling wants to merge 88 commits into
mainfrom
MRB-534-Implement-rule-to-generate-namelist
Open

Integration of MEC workflow#110
andreaspauling wants to merge 88 commits into
mainfrom
MRB-534-Implement-rule-to-generate-namelist

Conversation

@andreaspauling
Copy link
Copy Markdown

@andreaspauling andreaspauling commented Feb 12, 2026

Add the MEC workflow. The new parts are in green in the DAG: snakemake_dag.pdf

For each valid date a MEC case is set up and run. This includes:

  • creating the directory structure
  • adding the observations
  • organizing the model input including past runs depending on the config
  • rendering the MEC namelist
  • executing MEC for all dates with complete data for all leadtimes (excludes the first ones of the period)
  • storing the final feedback file in a separate place.

All MEC cases can be removed once the final feedback file is produced (removal not yet implemented).

  • Topics already raised by Francesco:
    • put folder mec/ in data/mec in order not to mix up init and valid time (MEC is valid time oriented)
    • check globbing options in MEC namelist with DWD (not documented, only FCR_TIME is supported afaik, * etc not). The aim is to avoid copying data.

dnerini and others added 30 commits October 7, 2025 14:01
* Distinguish between primary runs ('candidates') and secondary runs

* Docstrings
* Adopt forecast intervals including the end point

* Fix parsing

* Experiments work

* Update config/forecasters.yaml

* Align init times to availabiliy of COE

* run pre-commit

* Change README to COSMO-E availability

---------

Co-authored-by: Jonas Bhend <jonasbhend@users.noreply.github.com>
Co-authored-by: Jonas Bhend <jonas.bhend@meteoswiss.ch>
* draft changes

* rename workspace resources dir

* working for config/forecasters.yaml

* improve logging

* works for interpolators.yaml

* re-add get_leadtime function

* refactor run directives into script
* add region averages

* add regions to config

* Add regions to verification module, scripts, and rules

* add stratification to forecaster config and fix typo

* fix dict indexing

* fix append error

* read lon/lat from obs dataset

* Add inner verification domain

* Add missing dependency

* add plots by region

* Add regions to dashboard

* Fix dashboard

* Add region name and initializations to plot title (and remove header div)

* Add support for multiple regions

* Fix legend
@andreaspauling
Copy link
Copy Markdown
Author

Is this really necessary? We are effectively duplicating the entire output data.

Random thought. What if we used a named pipe with cat <*.grib> as a replacement for actually creating the large file?

@andreaspauling
Copy link
Copy Markdown
Author

  • FFV2 in this PR as well
  • evalml options --mec --ffv2 added (default: no mec/ffv2)
  • support of lists in config
  • mec running outside forecast run directories
  • support ver-files as source for observations
  • Paths moved to config
  • minor fixes / cleanup

@andreaspauling andreaspauling requested review from dnerini and frazane May 21, 2026 08:35
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes to the accumulation logic for total precipitation needed here? If not, I would remove these.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. MEC needs precip accumulated from the beginning of the run

config=Path(OUT_ROOT / "data/runs/{run_id}/{init_time}/config.yaml"),
resources=directory(OUT_ROOT / "data/runs/{run_id}/{init_time}/resources"),
grib_out_dir=directory(OUT_ROOT / "data/runs/{run_id}/{init_time}/grib"),
okfile=touch(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Claudes help: The okfile is necessary. inference_execute needs to depend on whichever of the two prepare rules ran, but it can't reference them directly by output path because both produce the same three outputs (config.yaml, resources/, grib/). The _inference_routing_fn function selects the correct prepare rule by model type — but to do so, it must reference a path that is unique per rule. The okfile provides that.

That okfile is used in _inference_routing_fn . The routing function returns the okfile path of whichever prepare rule ran (forecaster or interpolator), and inference_execute declares it as its input — this is how Snakemake knows to wait for the correct prepare rule to finish before launching inference.

Sounds plausible to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that touch("/some/file") already automatically generates the file when the rule succeeds.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use Snakemake's touch() on line 199 in inference.smk and then remove those three lines from the script in each function - would that adress your point? I could do that and test it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use touch(.../ok-file) in inference.smk instead of touching it in inference_prepare.py. I found no solution that worked. May we leave it with the current working solution or have a look at it together?

from datetime import timedelta


def _parse_steps(steps: str) -> list[int]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a duplicate of

def get_leadtimes(wc):
?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have different input and output. It may be possible to merge but that would need some time and result in a one more complicated function.

Comment thread workflow/rules/verif_obs.smk Outdated
Comment thread workflow/scripts/generate_ffv2_namelist.py Outdated
Comment thread workflow/rules/verif_obs.smk Outdated
Comment thread workflow/rules/verif_obs.smk Outdated
"""


# link_mec_input: create the input_mod dir with symlinks to all fc files from all source inits
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule is not creating symlinks, but copies. Didn't we want to avoid this?

Copy link
Copy Markdown
Author

@andreaspauling andreaspauling May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a version with only symlinks. However, this did not work because all fields needed to calculate precipitation must be one file. This is a consequence of the basic way MEC works - it reads the grib files, does all the processing and then reads the next file. The current version now just copies the data that is really needed, reducing the amount of data considerably - in the first version all inference output was copied.

If we want to save disk space we simply could remove the mec directory. This is what could be done once this workflow is consolidated. Then no disk space is used unnecessarily at the end of the workflow. The feedback files are stored separately.

If the grib writing will be in one file - that would solve this as well.

I added a docstring explaining what this rule does.

@andreaspauling andreaspauling requested a review from frazane May 21, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants