entropice/README.md

46 lines
2.2 KiB
Markdown
Raw Normal View History

2025-09-26 10:42:35 +02:00
# eSPA for RTS
Goal of this project is to utilize the entropy-optimal Scalable Probabilistic Approximations algorithm (eSPA) to create a model which can estimate the density of Retrogressive-Thaw-Slumps (RTS) across the globe with different levels of detail.
Hoping, that a successful training could gain new knowledge about RTX-proxies.
## Setup
```sh
uv sync
```
## Project Plan
1. Create global hexagon grids with h3
2. Enrich the grids with data from various sources and with labels from DARTS v2
3. Use eSPA for simple classification: hex has [many slumps / some slumps / few slumps / no slumps]
4. use SPARTAn for regression: one for slumps density (area) and one for total number of slumps
### Data Sources and Engineering
- Labels
- `"year"`: Year of observation
- `"area"`: Total land-area of the hexagon
- `"rts_density"`: Area of RTS divided by total land-area
- `"rts_count"`: Number of single RTS instances
- ERA5 (starting 40 years from `"year"`)
- `"temp_yearXXXX_qY"`: Y-th quantile temperature of year XXXX. Used to enter the temperature distribution into the model.
- `"thawing_days_yearXXXX"`: Number of thawing-days of year XXXX.
- `"precip_yearXXXX_qY"`: Y-th quantile precipitation of year XXXX. Similar to temperature.
- `"temp_5year_diff_XXXXtoXXXX_qY"`: Difference of the Y-th quantile temperature between year XXXX and XXXX. Always 5 years difference.
- `"temp_10year_diff_XXXXtoXXXX_qY"`: Difference of the Y-th quantile temperature between year XXXX and XXXX. Always 10 years difference.
- `"temp_diff_qY"`: Difference of the Y-th quantile temperature between year XXXX and XXXX. Always 10 years difference.
- ArcticDEM
- `"dissection_index"`: Dissection Index, (max - min) / max
- `"max_elevation"`: Maximum elevation
- `"elevationX_density"`: Area where the elevation is larger than X divided by the total land-area
- TCVIS
- ???
- Wildfire???
- Permafrost???
- GroundIceContent???
- Biome
**About temporals** Every label has its own year - all temporal dependent data features, e.g. `"temp_5year_diff_XXXXtoXXXX_qY"` are calculated respective to that year.
The number of years added from a dataset is always the same, e.g. for ERA5 for an observation in 2024 the ERA5 data would start in 1984 and for an observation from 2023 in 1983.