No description
Find a file
2026-01-11 20:51:53 +01:00
.github Update docs, instructions and format code 2026-01-04 17:19:02 +01:00
scripts Refactor dataset ensemble to allow different temporal modes 2026-01-11 15:57:14 +01:00
src/entropice Refactor autogluon 2026-01-11 20:51:53 +01:00
tests Refactor autogluon 2026-01-11 20:51:53 +01:00
.gitattributes Add first training 2025-11-07 15:56:54 +01:00
.gitignore Create Ensemble Datasets 2025-12-09 17:10:43 +01:00
.python-version Add arcticdem 2025-11-22 18:56:34 +01:00
ARCHITECTURE.md Update docs, instructions and format code 2026-01-04 17:19:02 +01:00
CONTRIBUTING.md Update docs, instructions and format code 2026-01-04 17:19:02 +01:00
pixi.lock Small fixes all over the place 2026-01-08 20:00:09 +01:00
Processing Documentation.md Start redoing the dashboard 2025-12-18 22:49:25 +01:00
pyproject.toml Small fixes all over the place 2026-01-08 20:00:09 +01:00
README.md Add some docs for copilot 2025-12-28 20:11:11 +01:00

Entropice

Geospatial Machine Learning for Arctic Permafrost Degradation. Entropice is a geospatial machine learning system for predicting Retrogressive Thaw Slump (RTS) density across the Arctic using entropy-optimal Scalable Probabilistic Approximations (eSPA). The system integrates multi-source geospatial data (climate, terrain, satellite imagery) into discrete global grids and trains probabilistic classifiers to estimate RTS occurrence patterns at multiple spatial resolutions.

Scientific Background

Retrogressive Thaw Slumps

Retrogressive Thaw Slumps are Arctic landslides caused by permafrost degradation. As ice-rich permafrost thaws, ground collapses create distinctive bowl-shaped features that retreat upslope over time. RTS are:

  • Climate indicators: Sensitive to warming temperatures and changing precipitation
  • Ecological disruptors: Release sediment, nutrients, and greenhouse gases into Arctic waterways
  • Infrastructure hazards: Threaten communities and industrial facilities in permafrost regions
  • Feedback mechanisms: Accelerate local warming through albedo changes and carbon release

Understanding RTS distribution patterns is critical for predicting permafrost stability under climate change.

The Challenge

Current remote sensing approaches try to map a specific landscape feature and then try to extract spatio-temporal statistical information from that dataset.

Traditional RTS mapping relies on manual digitization from satellite imagery (e.g., the DARTS v2 training-dataset), which is:

  • Labor-intensive and limited in spatial/temporal coverage
  • Challenging due to cloud cover and seasonal visibility
  • Insufficient for pan-Arctic prediction at decision-relevant scales

Modern mapping approaches utilize machine learning to create segmented labels from satellite imagery (e.g. the DARTS dataset), which comes with it own problems:

  • Huge data transfer needed between satellite imagery providers and HPC where the models are run
  • Large energy consumtion in both data transfer and inference
  • Uncertainty about the quality of the results
  • Pot. compute waste when running inference on regions where it is clear that the searched landscape feature does not exist

Our Approach

Instead of global mapping followed by calculation of spatio-temporal statistics, Entropice tries to learn spatio-temporal patterns from a small subset based on a large varyity of data features to get an educated guess about the spatio-temporal statistics of a landscape feature.

Entropice addresses this by:

  1. Spatial Discretization across scales: Representing the Arctic using discrete global grid systems (H3 hexagonal grids, HEALPix) on different low to mid resolutions (levels)
  2. Multi-Source Integration: Aggregating climate (ERA5), terrain (ArcticDEM), and satellite embeddings (AlphaEarth) into feature-rich datasets to obtain environmental proxies across spatio-temporal scales
  3. Probabilistic Modeling: Training eSPA classifiers to predict RTS density classes based on environmental proxies

This hopefully leads to the following advances in permafrost research:

  • Better understanding of RTS occurance
    • Potential proxy for Ice-Rich permafrost
  • Reduction of compute waste of image segmentation pipelines
  • Better modelling by providing better starting conditions

Entropy-Optimal Scalable Probabilistic Approximations (eSPA)

eSPA is a probabilistic classification framework that:

  • Provides calibrated probability estimates (not just point predictions)
  • Handles imbalanced datasets common in geospatial phenomena
  • Captures uncertainty in predictions across poorly-sampled regions
  • Enables interpretable feature importance analysis

This approach aims to discover which environmental variables best predict RTS occurrence, potentially revealing new proxies for permafrost vulnerability.

Key Features

  • Modular Data Pipeline: Sequential processing stages from raw data to trained models
  • Multiple Grid Systems: H3 (resolutions 3-6) and HEALPix (resolutions 6-10)
  • GPU-Accelerated: RAPIDS (CuPy, cuML) and PyTorch for large-scale computation
  • Interactive Dashboard: Streamlit-based visualization of training data, results, and predictions
  • Reproducible Workflows: Configuration-as-code with TOML files and CLI tools
  • Extensible Architecture: Support for alternative models (XGBoost, Random Forest, KNN) and data sources

Quick Start

Installation

Requires Python 3.13 and CUDA 12 compatible GPU.

pixi install

This sets up the complete environment including RAPIDS, PyTorch, and geospatial libraries.

Running the Pipeline

Execute the numbered scripts to process data and train models:

scripts/00grids.sh      # Generate spatial grids
scripts/01darts.sh      # Extract RTS labels from DARTS v2
scripts/02alphaearth.sh # Extract satellite embeddings
scripts/03era5.sh       # Process climate data
scripts/04arcticdem.sh  # Compute terrain features
scripts/05train.sh      # Train models

Visualizing Results

Launch the interactive dashboard:

pixi run dashboard

Explore training data distributions, cross-validation results, feature importance, and spatial predictions.

Data Sources

  • DARTS v2: RTS labels (polygons with year, area, count)
  • ERA5: Climate reanalysis (40-year history, Arctic-aligned years)
  • ArcticDEM: 32m resolution terrain elevation
  • AlphaEarth: 64-dimensional satellite image embeddings

Project Structure

  • src/entropice/: Core modules (grids, data processors, training, inference)
  • src/entropice/dashboard/: Streamlit visualization application
  • scripts/: Data processing pipeline automation
  • notebooks/: Exploratory analysis (not version-controlled)

Documentation

Research Goals

  1. Predictive Modeling: Estimate RTS density at unobserved locations
  2. Proxy Discovery: Identify environmental variables most predictive of RTS occurrence
  3. Multi-Scale Analysis: Compare model performance across spatial resolutions
  4. Uncertainty Quantification: Provide calibrated probabilities for decision-making

License

TODO

Citation

If you use Entropice in your research, please cite:

TODO