# Processing Documentation This document documents how long each processing step took and how much memory and compute (CPU & GPU) it needed. | Grid | ArcticDEM | Era5 | AlphaEarth | Darts | | ----- | --------- | ---- | ---------- | ----- | | Hex3 | [ ] | [/] | [ ] | [/] | | Hex4 | [ ] | [/] | [ ] | [/] | | Hex5 | [ ] | [/] | [ ] | [/] | | Hex6 | [ ] | [ ] | [ ] | [ ] | | Hpx6 | [x] | [/] | [ ] | [/] | | Hpx7 | [ ] | [/] | [ ] | [/] | | Hpx8 | [ ] | [/] | [ ] | [/] | | Hpx9 | [ ] | [/] | [ ] | [/] | | Hpx10 | [ ] | [ ] | [ ] | [ ] | ## Grid creation The creation of grids did not take up any significant amount of memory or compute. The time taken to create a grid was between few seconds for smaller levels up to a few minutes for the high levels. ## DARTS Similar to grid creation, no significant amount of memory, compute or time needed. ## ArcticDEM The download took around 8h with memory usage of about 10GB and no stronger limitations on compute. The size of the resulted icechunk zarr datacube was approx. 160GB on disk which corresponse to approx. 270GB in memory if loaded in. The enrichment took around 2h on a single A100 GPU node (40GB) with a local dask cluster consisting of 7 processes, each using 2 threads and 30GB of memory, making up a total of 210GB of memory. These settings can be changed easily to consume less memory by reducing the number of processes or threads. More processes or thread could not be used to ensure that the GPU does not run out of memory. ### Spatial aggregations into grids All spatial aggregations relied heavily on CPU compute, since Cupy lacking support for nanquantile and for higher resolution grids the amount of pixels to reduce where too small to overcome the data movement overhead of using a GPU. The aggregations scale through the number of concurrent processes (specified by `--concurrent_partitions`) accumulating linearly more memory with higher parallel computation. All spatial aggregations into the different grids done took around 30 min each, with a total memory peak of ~300 GB partitioned over 40 processes. ## Alpha Earth The download was heavy limited through the scale of the input data, which is ~10m in the original dataset. 10m as a scale was not computationally feasible for the Google Earth Engine servers, thus each grid and level used another scale to aggregate and download the data. Each scale was choosen so that each grid cell had around 10000px do estimate the aggregations from it. | grid | time | scale | | ----- | ------- | ----- | | Hex3 | | 1600 | | Hex4 | | 600 | | Hex5 | | 240 | | Hex6 | | 90 | | Hpx6 | 58 min | 1600 | | Hpx7 | 3:16 h | 800 | | Hpx8 | 13:19 h | 400 | | Hpx9 | | 200 | | Hpx10 | | 100 | ## Era5 ### Spatial aggregations into grids All spatial aggregations relied heavily on CPU compute, since Cupy lacking support for nanquantile and for higher resolution grids the amount of pixels to reduce where too small to overcome the data movement overhead of using a GPU. The aggregations scale through the number of concurrent processes (specified by `--concurrent_partitions`) accumulating linearly more memory with higher parallel computation. Since the resolution of the ERA5 dataset is spatially smaller than the resolution of the higher-resolution, different aggregations methods where used for different grid-levels: - Common aggregations: mean, min, max, std, median, p01, p05, p25, p75, p95, p99 for low resolution grids - Only mean aggregations for medium resolution grids - Linar interpolation for high resolution grids For geometries crossing the antimeridian, geometries are corrected. | grid | method | | ----- | ----------- | | Hex3 | Common | | Hex4 | Common | | Hex5 | Mean | | Hex6 | Interpolate | | Hpx6 | Common | | Hpx7 | Common | | Hpx8 | Common | | Hpx9 | Mean | | Hpx10 | Interpolate | ???