| .github | ||
| scripts | ||
| src/entropice | ||
| tests | ||
| .gitattributes | ||
| .gitignore | ||
| .python-version | ||
| ARCHITECTURE.md | ||
| autogluon-config.toml | ||
| CONTRIBUTING.md | ||
| hpsearchcv-config.toml | ||
| pixi.lock | ||
| Processing Documentation.md | ||
| pyproject.toml | ||
| README.md | ||
Entropice
Geospatial Machine Learning for Arctic Permafrost Degradation. Entropice is a geospatial machine learning system for predicting Retrogressive Thaw Slump (RTS) density across the Arctic using entropy-optimal Scalable Probabilistic Approximations (eSPA). The system integrates multi-source geospatial data (climate, terrain, satellite imagery) into discrete global grids and trains probabilistic classifiers to estimate RTS occurrence patterns at multiple spatial resolutions.
Scientific Background
Retrogressive Thaw Slumps
Retrogressive Thaw Slumps are Arctic landslides caused by permafrost degradation. As ice-rich permafrost thaws, ground collapses create distinctive bowl-shaped features that retreat upslope over time. RTS are:
- Climate indicators: Sensitive to warming temperatures and changing precipitation
- Ecological disruptors: Release sediment, nutrients, and greenhouse gases into Arctic waterways
- Infrastructure hazards: Threaten communities and industrial facilities in permafrost regions
- Feedback mechanisms: Accelerate local warming through albedo changes and carbon release
Understanding RTS distribution patterns is critical for predicting permafrost stability under climate change.
The Challenge
Current remote sensing approaches try to map a specific landscape feature and then try to extract spatio-temporal statistical information from that dataset.
Traditional RTS mapping relies on manual digitization from satellite imagery (e.g., the DARTS v2 training-dataset), which is:
- Labor-intensive and limited in spatial/temporal coverage
- Challenging due to cloud cover and seasonal visibility
- Insufficient for pan-Arctic prediction at decision-relevant scales
Modern mapping approaches utilize machine learning to create segmented labels from satellite imagery (e.g. the DARTS dataset), which comes with it own problems:
- Huge data transfer needed between satellite imagery providers and HPC where the models are run
- Large energy consumtion in both data transfer and inference
- Uncertainty about the quality of the results
- Pot. compute waste when running inference on regions where it is clear that the searched landscape feature does not exist
Our Approach
Instead of global mapping followed by calculation of spatio-temporal statistics, Entropice tries to learn spatio-temporal patterns from a small subset based on a large varyity of data features to get an educated guess about the spatio-temporal statistics of a landscape feature.
Entropice addresses this by:
- Spatial Discretization across scales: Representing the Arctic using discrete global grid systems (H3 hexagonal grids, HEALPix) on different low to mid resolutions (levels)
- Multi-Source Integration: Aggregating climate (ERA5), terrain (ArcticDEM), and satellite embeddings (AlphaEarth) into feature-rich datasets to obtain environmental proxies across spatio-temporal scales
- Probabilistic Modeling: Training eSPA classifiers to predict RTS density classes based on environmental proxies
This hopefully leads to the following advances in permafrost research:
- Better understanding of RTS occurance
- Potential proxy for Ice-Rich permafrost
- Reduction of compute waste of image segmentation pipelines
- Better modelling by providing better starting conditions
Entropy-Optimal Scalable Probabilistic Approximations (eSPA)
eSPA is a probabilistic classification framework that:
- Provides calibrated probability estimates (not just point predictions)
- Handles imbalanced datasets common in geospatial phenomena
- Captures uncertainty in predictions across poorly-sampled regions
- Enables interpretable feature importance analysis
This approach aims to discover which environmental variables best predict RTS occurrence, potentially revealing new proxies for permafrost vulnerability.
Key Features
- Modular Data Pipeline: Sequential processing stages from raw data to trained models
- Multiple Grid Systems: H3 (resolutions 3-6) and HEALPix (resolutions 6-10)
- GPU-Accelerated: RAPIDS (CuPy, cuML) and PyTorch for large-scale computation
- Interactive Dashboard: Streamlit-based visualization of training data, results, and predictions
- Reproducible Workflows: Configuration-as-code with TOML files and CLI tools
- Extensible Architecture: Support for alternative models (XGBoost, Random Forest, KNN) and data sources
Quick Start
Installation
Requires Python 3.13 and CUDA 12 compatible GPU.
pixi install
This sets up the complete environment including RAPIDS, PyTorch, and geospatial libraries.
Running the Pipeline
Execute the numbered scripts to process data and train models:
scripts/00grids.sh # Generate spatial grids
scripts/01darts.sh # Extract RTS labels from DARTS v2
scripts/02alphaearth.sh # Extract satellite embeddings
scripts/03era5.sh # Process climate data
scripts/04arcticdem.sh # Compute terrain features
scripts/05train.sh # Train models
Visualizing Results
Launch the interactive dashboard:
pixi run dashboard
Explore training data distributions, cross-validation results, feature importance, and spatial predictions.
Data Sources
- DARTS v2: RTS labels (polygons with year, area, count)
- ERA5: Climate reanalysis (40-year history, Arctic-aligned years)
- ArcticDEM: 32m resolution terrain elevation
- AlphaEarth: 64-dimensional satellite image embeddings
Project Structure
src/entropice/: Core modules (grids, data processors, training, inference)src/entropice/dashboard/: Streamlit visualization applicationscripts/: Data processing pipeline automationnotebooks/: Exploratory analysis (not version-controlled)
Documentation
- ARCHITECTURE.md: System design, components, and data flow
- CONTRIBUTING.md: Development guidelines and standards
Research Goals
- Predictive Modeling: Estimate RTS density at unobserved locations
- Proxy Discovery: Identify environmental variables most predictive of RTS occurrence
- Multi-Scale Analysis: Compare model performance across spatial resolutions
- Uncertainty Quantification: Provide calibrated probabilities for decision-making
License
TODO
Citation
If you use Entropice in your research, please cite:
TODO