Unify regions stations#154
Conversation
Co-authored-by: Francesco Zanetta <62377868+frazane@users.noreply.github.com> Co-authored-by: Michele Cattaneo <44707621+MicheleCattaneo@users.noreply.github.com> Co-authored-by: Hugues de Laroussilhe <hugues.delaroussilhe@meteoswiss.ch>
…d (as in baseline)
Co-authored-by: Daniele Nerini <daniele.nerini@meteoswiss.ch>
Co-authored-by: Daniele Nerini <daniele.nerini@meteoswiss.ch>
|
Hi @cosunae Thanks for pushing this. Could you maybe rearrange the commits so that git(hub) understands that the files have been renamed? Otherwise it's impossible to identify changes made to the scripts. I could give you a hand if need be, just let me know (don't want to interfere with your work, as the fix will rewrite history). |
I will try, yes, but I think it wont help with the review because the files are very different (Inherently because the structure of a marimo and python file are). |
3d9465f to
c3775ad
Compare
c3775ad to
c1f758b
Compare
|
done, after rewriting the git history with a git mv and a force push, still similarity is very small, so PR will refuse to show it as a diff |
Thanks and sorry for putting you on this. I didn't think about the python script / marimo issue |
|
anything missing here @frazane @jonasbhend ? could you take another look? |
jonasbhend
left a comment
There was a problem hiding this comment.
Hi @cosunae, thanks for tackling this. It is looking good from what I can tell. However, when trying to run the interpolator config with meteograms enabled, I get an error. This is not coming from your PR, as the error is already in main (see below). Is this something we want to fix as part of this PR?
evalml showcase config/interpolators-ich1.yaml -n
host: balfrin-ln002
Building DAG of jobs...
InputFunctionException in rule plot_meteogram in file "/scratch/mch/bhendj/evalml-opr/workflow/rules/plot.smk", line 28:
Error:
ValueError: No baseline zarr found for init time 202503010000
Wildcards:
showcase=20260527_interpolators-ich1_7de8
run_id=interpolator-tmp-d5aa-on-forecaster-c304-1e7e/7d5c
init_time=202503010000
param=T_2M
Traceback:
File "/scratch/mch/bhendj/evalml-opr/workflow/rules/plot.smk", line 53, in <lambda>
File "/scratch/mch/bhendj/evalml-opr/workflow/rules/plot.smk", line 24, in _get_available_baselines (rule plot_meteogram, line 30, /scratch/mch/bhendj/evalml-opr/workflow/rules/plot.smk)
I'll try to fix this |
jonasbhend
left a comment
There was a problem hiding this comment.
We still create thousands of jobs when running the showcase (animations) use case. Does this make sense given the potentially 'substantial' setup cost of initializing python and loading modules? Alternatively, we could process all lead times and all parameters at once. The lead times only provide a benefit insofar as thereby the setup cost is manageable.
This proposition is just to keep the number of jobs manageable (hopefully easier for the scheduler).
whaaaat, plotting a single forecast frame (one parameter, one domain) takes 2 minutes! Why is this so slow @frazane, @cosunae? |
|
Ok I now realize that this would be a somewhat largish undertaking, as the data loading for the forecast frame is different from io in meteograms and experiments. So I leave this to you to decide if we want to close the PR now and follow up with further harmonization of the io routines in a later PR. I strongly suggest we tackle this though, it seems such an obvious cause for friction losses and redundancy. |
|
The entire showcase component of the workflow needs some refactor in future PRs. The code has grown organically since the start and we've never put too much care into it. There's a lot of performance optimization on the table. @clairemerker is going to look into this once #162 is merged. |
Then I suggest to merge this |
When calling the showcase with a number of regions (for the plots) and locations for meteograms, currently we launch 1 task per region and location. While this maximizes parallelism, it is not efficient given these tasks are mostly IO bound that they will load the same data (entire horizontal plane) multiples times in order to select later on the region/location.
In this PR we modify the following: