entropice/Processing Documentation.md

3.2 KiB

Processing Documentation

This document documents how long each processing step took and how much memory and compute (CPU & GPU) it needed.

Grid ArcticDEM Era5 AlphaEarth Darts
Hex3 [ ] [/] [ ] [/]
Hex4 [ ] [/] [ ] [/]
Hex5 [ ] [/] [ ] [/]
Hex6 [ ] [ ] [ ] [ ]
Hpx6 [x] [/] [ ] [/]
Hpx7 [ ] [/] [ ] [/]
Hpx8 [ ] [/] [ ] [/]
Hpx9 [ ] [/] [ ] [/]
Hpx10 [ ] [ ] [ ] [ ]

Grid creation

The creation of grids did not take up any significant amount of memory or compute. The time taken to create a grid was between few seconds for smaller levels up to a few minutes for the high levels.

DARTS

Similar to grid creation, no significant amount of memory, compute or time needed.

ArcticDEM

The download took around 8h with memory usage of about 10GB and no stronger limitations on compute. The size of the resulted icechunk zarr datacube was approx. 160GB on disk which corresponse to approx. 270GB in memory if loaded in.

The enrichment took around 2h on a single A100 GPU node (40GB) with a local dask cluster consisting of 7 processes, each using 2 threads and 30GB of memory, making up a total of 210GB of memory. These settings can be changed easily to consume less memory by reducing the number of processes or threads. More processes or thread could not be used to ensure that the GPU does not run out of memory.

Spatial aggregations into grids

All spatial aggregations relied heavily on CPU compute, since Cupy lacking support for nanquantile and for higher resolution grids the amount of pixels to reduce where too small to overcome the data movement overhead of using a GPU.

The aggregations scale through the number of concurrent processes (specified by --concurrent_partitions) accumulating linearly more memory with higher parallel computation.

grid time memory processes
Hex3
Hex4
Hex5
Hex6
Hpx6 37 min ~300GB 40
Hpx7
Hpx8
Hpx9 25m ~300GB 40
Hpx10 34 min ~300GB 40

Alpha Earth

The download was heavy limited through the scale of the input data, which is ~10m in the original dataset. 10m as a scale was not computationally feasible for the Google Earth Engine servers, thus each grid and level used another scale to aggregate and download the data. Each scale was choosen so that each grid cell had around 10000px do estimate the aggregations from it.

grid time scale
Hex3 1600
Hex4 600
Hex5 240
Hex6 90
Hpx6 58 min 1600
Hpx7 3:16 h 800
Hpx8 13:19 h 400
Hpx9 200
Hpx10 100

Era5

???