Improve performance for large loads with dask #912

aulemahal · 2021-11-10T21:45:50Z

The transparent use of dask within xarray is really nice, but it has the side effect of creating a dask task for every operation. In the case of xclim, with all those rolling and resample the number of tasks created is ridiculously high.

With large datasets, the scheduler is sometimes so overloaded that it never even begins the computation. Sometimes it crashes, sometimes it only hangs. Most of the time we get several "WARNING - full garbage collections...." and other warnings.

Rechunking to larger chunk sometimes helps and sometime is insufficient.

We had this problem in sdba and it was solved by wrapping every "unit" operation with map_blocks. This way, we combine many small operations in a single dask task. However, this solution has a lot of drawbacks. It's complicated to maintain, has many bugs with "auxiliary" coords and it's just hard to read.

The idea is there though. Could we implement something that would wrap the compute into a single dask task, from within the indicator's __call__? I guess it should be controlled by an option, and maybe only apply to indicators performing resampling, since those are the main victim of task decuplation?

The text was updated successfully, but these errors were encountered:

aulemahal · 2022-12-13T16:27:29Z

This issue is out-of-scope for xclim. flox has improved the "resample" case and the "rolling" one might have to be improved by a similar process : within xarray or a plugin like flox.

dcherian · 2022-12-13T16:39:48Z

It'd be good to see a minimal example of the rolling problem. Also pydata/xarray#7344 (comment)

aulemahal added enhancement New feature or request help wanted Extra attention is needed standards / conventions Suggestions on ways forward labels Nov 10, 2021

aulemahal mentioned this issue Dec 13, 2021

Bad performance of run length algorithms #956

Closed

aulemahal closed this as completed Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance for large loads with dask #912

Improve performance for large loads with dask #912

aulemahal commented Nov 10, 2021

aulemahal commented Dec 13, 2022

dcherian commented Dec 13, 2022

Improve performance for large loads with dask #912

Improve performance for large loads with dask #912

Comments

aulemahal commented Nov 10, 2021

aulemahal commented Dec 13, 2022

dcherian commented Dec 13, 2022