Sorry for being nosy - I am curious about all the columnar technologies in an actual analysis context like Hgg :)
I saw that pickle is mentioned/used in several places. You might try parquet instead (df.to_parquet(), pd.read_parquet()). It's essentially the industry version of ROOT, so it'll be much faster at serializing/deserializing than pickle. Pickle is probably faster without compression, but if you use df.to_pickle("blah.pkl.gz"), parquet will be faster. Of course this all depends on how big your files are.
Sorry for being nosy - I am curious about all the columnar technologies in an actual analysis context like Hgg :)
I saw that
pickleis mentioned/used in several places. You might try parquet instead (df.to_parquet(),pd.read_parquet()). It's essentially the industry version of ROOT, so it'll be much faster at serializing/deserializing than pickle. Pickle is probably faster without compression, but if you usedf.to_pickle("blah.pkl.gz"), parquet will be faster. Of course this all depends on how big your files are.