Integration of MEC workflow#110
Conversation
… we want to factor it out of the rule
* Distinguish between primary runs ('candidates') and secondary runs
* Docstrings
* Adopt forecast intervals including the end point * Fix parsing * Experiments work * Update config/forecasters.yaml * Align init times to availabiliy of COE * run pre-commit * Change README to COSMO-E availability --------- Co-authored-by: Jonas Bhend <jonasbhend@users.noreply.github.com> Co-authored-by: Jonas Bhend <jonas.bhend@meteoswiss.ch>
* draft changes * rename workspace resources dir * working for config/forecasters.yaml * improve logging * works for interpolators.yaml * re-add get_leadtime function * refactor run directives into script
* add region averages * add regions to config * Add regions to verification module, scripts, and rules * add stratification to forecaster config and fix typo * fix dict indexing * fix append error * read lon/lat from obs dataset * Add inner verification domain * Add missing dependency * add plots by region * Add regions to dashboard * Fix dashboard * Add region name and initializations to plot title (and remove header div) * Add support for multiple regions * Fix legend
…e-to-generate-namelist
|
…ule-to-generate-namelist' into MRB-536-for-review
|
There was a problem hiding this comment.
Are these changes to the accumulation logic for total precipitation needed here? If not, I would remove these.
There was a problem hiding this comment.
Exactly. MEC needs precip accumulated from the beginning of the run
| config=Path(OUT_ROOT / "data/runs/{run_id}/{init_time}/config.yaml"), | ||
| resources=directory(OUT_ROOT / "data/runs/{run_id}/{init_time}/resources"), | ||
| grib_out_dir=directory(OUT_ROOT / "data/runs/{run_id}/{init_time}/grib"), | ||
| okfile=touch( |
There was a problem hiding this comment.
With Claudes help: The okfile is necessary. inference_execute needs to depend on whichever of the two prepare rules ran, but it can't reference them directly by output path because both produce the same three outputs (config.yaml, resources/, grib/). The _inference_routing_fn function selects the correct prepare rule by model type — but to do so, it must reference a path that is unique per rule. The okfile provides that.
That okfile is used in _inference_routing_fn . The routing function returns the okfile path of whichever prepare rule ran (forecaster or interpolator), and inference_execute declares it as its input — this is how Snakemake knows to wait for the correct prepare rule to finish before launching inference.
Sounds plausible to me.
There was a problem hiding this comment.
What I mean is that touch("/some/file") already automatically generates the file when the rule succeeds.
There was a problem hiding this comment.
I could use Snakemake's touch() on line 199 in inference.smk and then remove those three lines from the script in each function - would that adress your point? I could do that and test it.
There was a problem hiding this comment.
I tried to use touch(.../ok-file) in inference.smk instead of touching it in inference_prepare.py. I found no solution that worked. May we leave it with the current working solution or have a look at it together?
| from datetime import timedelta | ||
|
|
||
|
|
||
| def _parse_steps(steps: str) -> list[int]: |
There was a problem hiding this comment.
Isn't this a duplicate of
evalml/workflow/rules/plot.smk
Line 119 in e4af0a6
There was a problem hiding this comment.
They have different input and output. It may be possible to merge but that would need some time and result in a one more complicated function.
| """ | ||
|
|
||
|
|
||
| # link_mec_input: create the input_mod dir with symlinks to all fc files from all source inits |
There was a problem hiding this comment.
This rule is not creating symlinks, but copies. Didn't we want to avoid this?
There was a problem hiding this comment.
I implemented a version with only symlinks. However, this did not work because all fields needed to calculate precipitation must be one file. This is a consequence of the basic way MEC works - it reads the grib files, does all the processing and then reads the next file. The current version now just copies the data that is really needed, reducing the amount of data considerably - in the first version all inference output was copied.
If we want to save disk space we simply could remove the mec directory. This is what could be done once this workflow is consolidated. Then no disk space is used unnecessarily at the end of the workflow. The feedback files are stored separately.
If the grib writing will be in one file - that would solve this as well.
I added a docstring explaining what this rule does.
Add the MEC workflow. The new parts are in green in the DAG: snakemake_dag.pdf
For each valid date a MEC case is set up and run. This includes:
All MEC cases can be removed once the final feedback file is produced (removal not yet implemented).