Refactor non-dashboard modules
This commit is contained in:
parent
907e907856
commit
45bc61e49e
22 changed files with 122 additions and 186 deletions
45
.github/copilot-instructions.md
vendored
45
.github/copilot-instructions.md
vendored
|
|
@ -42,7 +42,7 @@ For project goals and setup, see [README.md](../README.md).
|
||||||
|
|
||||||
- Follow the numbered script sequence: `00grids.sh` → `01darts.sh` → `02alphaearth.sh` → `03era5.sh` → `04arcticdem.sh` → `05train.sh`
|
- Follow the numbered script sequence: `00grids.sh` → `01darts.sh` → `02alphaearth.sh` → `03era5.sh` → `04arcticdem.sh` → `05train.sh`
|
||||||
- Each pipeline stage should produce **reproducible intermediate outputs**
|
- Each pipeline stage should produce **reproducible intermediate outputs**
|
||||||
- Use `src/entropice/paths.py` for consistent path management
|
- Use `src/entropice/utils/paths.py` for consistent path management
|
||||||
- Environment variable `FAST_DATA_DIR` controls data directory location (default: `./data`)
|
- Environment variable `FAST_DATA_DIR` controls data directory location (default: `./data`)
|
||||||
|
|
||||||
### Storage Hierarchy
|
### Storage Hierarchy
|
||||||
|
|
@ -64,16 +64,37 @@ DATA_DIR/
|
||||||
|
|
||||||
### Core Modules (`src/entropice/`)
|
### Core Modules (`src/entropice/`)
|
||||||
|
|
||||||
- **`grids.py`**: H3/HEALPix spatial grid generation with watermask
|
The codebase is organized into four main packages:
|
||||||
|
|
||||||
|
- **`entropice.ingest`**: Data ingestion from external sources
|
||||||
|
- **`entropice.spatial`**: Spatial operations and grid management
|
||||||
|
- **`entropice.ml`**: Machine learning workflows
|
||||||
|
- **`entropice.utils`**: Common utilities
|
||||||
|
|
||||||
|
#### Data Ingestion (`src/entropice/ingest/`)
|
||||||
|
|
||||||
- **`darts.py`**: RTS label extraction from DARTS v2 dataset
|
- **`darts.py`**: RTS label extraction from DARTS v2 dataset
|
||||||
- **`era5.py`**: Climate data processing from ERA5 (Arctic-aligned years: Oct 1 - Sep 30)
|
- **`era5.py`**: Climate data processing from ERA5 (Arctic-aligned years: Oct 1 - Sep 30)
|
||||||
- **`arcticdem.py`**: Terrain analysis from 32m Arctic elevation data
|
- **`arcticdem.py`**: Terrain analysis from 32m Arctic elevation data
|
||||||
- **`alphaearth.py`**: Satellite image embeddings via Google Earth Engine
|
- **`alphaearth.py`**: Satellite image embeddings via Google Earth Engine
|
||||||
|
|
||||||
|
#### Spatial Operations (`src/entropice/spatial/`)
|
||||||
|
|
||||||
|
- **`grids.py`**: H3/HEALPix spatial grid generation with watermask
|
||||||
- **`aggregators.py`**: Raster-to-vector spatial aggregation engine
|
- **`aggregators.py`**: Raster-to-vector spatial aggregation engine
|
||||||
|
- **`watermask.py`**: Ocean masking utilities
|
||||||
|
- **`xvec.py`**: Extended vector operations for xarray
|
||||||
|
|
||||||
|
#### Machine Learning (`src/entropice/ml/`)
|
||||||
|
|
||||||
- **`dataset.py`**: Multi-source data integration and feature engineering
|
- **`dataset.py`**: Multi-source data integration and feature engineering
|
||||||
- **`training.py`**: Model training with eSPA, XGBoost, Random Forest, KNN
|
- **`training.py`**: Model training with eSPA, XGBoost, Random Forest, KNN
|
||||||
- **`inference.py`**: Batch prediction pipeline for trained models
|
- **`inference.py`**: Batch prediction pipeline for trained models
|
||||||
|
|
||||||
|
#### Utilities (`src/entropice/utils/`)
|
||||||
|
|
||||||
- **`paths.py`**: Centralized path management
|
- **`paths.py`**: Centralized path management
|
||||||
|
- **`codecs.py`**: Custom codecs for data serialization
|
||||||
|
|
||||||
### Dashboard (`src/entropice/dashboard/`)
|
### Dashboard (`src/entropice/dashboard/`)
|
||||||
|
|
||||||
|
|
@ -110,10 +131,10 @@ pixi run pytest
|
||||||
|
|
||||||
### Common Tasks
|
### Common Tasks
|
||||||
|
|
||||||
- **Generate grids**: Use `grids.py` CLI
|
- **Generate grids**: Use `spatial/grids.py` CLI
|
||||||
- **Process labels**: Use `darts.py` CLI
|
- **Process labels**: Use `ingest/darts.py` CLI
|
||||||
- **Train models**: Use `training.py` CLI with TOML config
|
- **Train models**: Use `ml/training.py` CLI with TOML config
|
||||||
- **Run inference**: Use `inference.py` CLI
|
- **Run inference**: Use `ml/inference.py` CLI
|
||||||
- **View results**: `pixi run dashboard`
|
- **View results**: `pixi run dashboard`
|
||||||
|
|
||||||
## Key Design Patterns
|
## Key Design Patterns
|
||||||
|
|
@ -162,10 +183,10 @@ Training features:
|
||||||
|
|
||||||
To extend Entropice:
|
To extend Entropice:
|
||||||
|
|
||||||
- **New data source**: Follow patterns in `era5.py` or `arcticdem.py`
|
- **New data source**: Follow patterns in `ingest/era5.py` or `ingest/arcticdem.py`
|
||||||
- **Custom aggregations**: Add to `_Aggregations` dataclass in `aggregators.py`
|
- **Custom aggregations**: Add to `_Aggregations` dataclass in `spatial/aggregators.py`
|
||||||
- **Alternative labels**: Implement extractor following `darts.py` pattern
|
- **Alternative labels**: Implement extractor following `ingest/darts.py` pattern
|
||||||
- **New models**: Add scikit-learn compatible estimators to `training.py`
|
- **New models**: Add scikit-learn compatible estimators to `ml/training.py`
|
||||||
- **Dashboard pages**: Add Streamlit pages to `dashboard/` module
|
- **Dashboard pages**: Add Streamlit pages to `dashboard/` module
|
||||||
|
|
||||||
## Important Notes
|
## Important Notes
|
||||||
|
|
@ -175,13 +196,13 @@ To extend Entropice:
|
||||||
- Handle **antimeridian crossing** in polar regions
|
- Handle **antimeridian crossing** in polar regions
|
||||||
- Use **batch processing** for GPU memory management
|
- Use **batch processing** for GPU memory management
|
||||||
- Notebooks are for exploration only - **keep production code in `src/`**
|
- Notebooks are for exploration only - **keep production code in `src/`**
|
||||||
- Always use **absolute paths** or paths from `paths.py`
|
- Always use **absolute paths** or paths from `utils/paths.py`
|
||||||
|
|
||||||
## Common Issues
|
## Common Issues
|
||||||
|
|
||||||
- **Memory**: Use batch processing and Dask chunking for large datasets
|
- **Memory**: Use batch processing and Dask chunking for large datasets
|
||||||
- **GPU OOM**: Reduce batch size in inference or training
|
- **GPU OOM**: Reduce batch size in inference or training
|
||||||
- **Antimeridian**: Use proper handling in `aggregators.py` for polar grids
|
- **Antimeridian**: Use proper handling in `spatial/aggregators.py` for polar grids
|
||||||
- **Temporal alignment**: ERA5 uses Arctic-aligned years (Oct-Sep)
|
- **Temporal alignment**: ERA5 uses Arctic-aligned years (Oct-Sep)
|
||||||
- **CRS**: Compute in EPSG:3413, visualize in EPSG:4326
|
- **CRS**: Compute in EPSG:3413, visualize in EPSG:4326
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -27,7 +27,14 @@ The pipeline follows a sequential processing approach where each stage produces
|
||||||
|
|
||||||
### System Components
|
### System Components
|
||||||
|
|
||||||
#### 1. Spatial Grid System (`grids.py`)
|
The codebase is organized into four main packages:
|
||||||
|
|
||||||
|
- **`entropice.ingest`**: Data ingestion from external sources (DARTS, ERA5, ArcticDEM, AlphaEarth)
|
||||||
|
- **`entropice.spatial`**: Spatial operations and grid management
|
||||||
|
- **`entropice.ml`**: Machine learning workflows (dataset, training, inference)
|
||||||
|
- **`entropice.utils`**: Common utilities (paths, codecs)
|
||||||
|
|
||||||
|
#### 1. Spatial Grid System (`spatial/grids.py`)
|
||||||
|
|
||||||
- **Purpose**: Creates global tessellations (discrete global grid systems) for spatial aggregation
|
- **Purpose**: Creates global tessellations (discrete global grid systems) for spatial aggregation
|
||||||
- **Grid Types & Levels**: H3 hexagonal grids and HEALPix grids
|
- **Grid Types & Levels**: H3 hexagonal grids and HEALPix grids
|
||||||
|
|
@ -39,7 +46,7 @@ The pipeline follows a sequential processing approach where each stage produces
|
||||||
- Provides spatial indexing for efficient data aggregation
|
- Provides spatial indexing for efficient data aggregation
|
||||||
- **Output**: GeoDataFrames with cell IDs, geometries, and land areas
|
- **Output**: GeoDataFrames with cell IDs, geometries, and land areas
|
||||||
|
|
||||||
#### 2. Label Management (`darts.py`)
|
#### 2. Label Management (`ingest/darts.py`)
|
||||||
|
|
||||||
- **Purpose**: Extracts RTS labels from DARTS v2 dataset
|
- **Purpose**: Extracts RTS labels from DARTS v2 dataset
|
||||||
- **Processing**:
|
- **Processing**:
|
||||||
|
|
@ -51,7 +58,7 @@ The pipeline follows a sequential processing approach where each stage produces
|
||||||
|
|
||||||
#### 3. Feature Extractors
|
#### 3. Feature Extractors
|
||||||
|
|
||||||
**ERA5 Climate Data (`era5.py`)**
|
**ERA5 Climate Data (`ingest/era5.py`)**
|
||||||
|
|
||||||
- Downloads hourly climate variables from Copernicus Climate Data Store
|
- Downloads hourly climate variables from Copernicus Climate Data Store
|
||||||
- Computes daily aggregates: temperature extrema, precipitation, snow metrics
|
- Computes daily aggregates: temperature extrema, precipitation, snow metrics
|
||||||
|
|
@ -59,7 +66,7 @@ The pipeline follows a sequential processing approach where each stage produces
|
||||||
- Temporal aggregations: yearly, seasonal, shoulder seasons
|
- Temporal aggregations: yearly, seasonal, shoulder seasons
|
||||||
- Uses Arctic-aligned years (October 1 - September 30)
|
- Uses Arctic-aligned years (October 1 - September 30)
|
||||||
|
|
||||||
**ArcticDEM Terrain (`arcticdem.py`)**
|
**ArcticDEM Terrain (`ingest/arcticdem.py`)**
|
||||||
|
|
||||||
- Processes 32m resolution Arctic elevation data
|
- Processes 32m resolution Arctic elevation data
|
||||||
- Computes terrain derivatives: slope, aspect, curvature
|
- Computes terrain derivatives: slope, aspect, curvature
|
||||||
|
|
@ -67,14 +74,14 @@ The pipeline follows a sequential processing approach where each stage produces
|
||||||
- Applies watermask clipping and GPU-accelerated convolutions
|
- Applies watermask clipping and GPU-accelerated convolutions
|
||||||
- Aggregates terrain statistics per grid cell
|
- Aggregates terrain statistics per grid cell
|
||||||
|
|
||||||
**AlphaEarth Embeddings (`alphaearth.py`)**
|
**AlphaEarth Embeddings (`ingest/alphaearth.py`)**
|
||||||
|
|
||||||
- Extracts 64-dimensional satellite image embeddings via Google Earth Engine
|
- Extracts 64-dimensional satellite image embeddings via Google Earth Engine
|
||||||
- Uses foundation models to capture visual patterns
|
- Uses foundation models to capture visual patterns
|
||||||
- Partitions large grids using KMeans clustering
|
- Partitions large grids using KMeans clustering
|
||||||
- Temporal sampling across multiple years
|
- Temporal sampling across multiple years
|
||||||
|
|
||||||
#### 4. Spatial Aggregation Framework (`aggregators.py`)
|
#### 4. Spatial Aggregation Framework (`spatial/aggregators.py`)
|
||||||
|
|
||||||
- **Core Capability**: Raster-to-vector aggregation engine
|
- **Core Capability**: Raster-to-vector aggregation engine
|
||||||
- **Methods**:
|
- **Methods**:
|
||||||
|
|
@ -86,7 +93,7 @@ The pipeline follows a sequential processing approach where each stage produces
|
||||||
- Parallel processing with worker pools
|
- Parallel processing with worker pools
|
||||||
- GPU acceleration via CuPy/CuML where applicable
|
- GPU acceleration via CuPy/CuML where applicable
|
||||||
|
|
||||||
#### 5. Dataset Assembly (`dataset.py`)
|
#### 5. Dataset Assembly (`ml/dataset.py`)
|
||||||
|
|
||||||
- **DatasetEnsemble Class**: Orchestrates multi-source data integration
|
- **DatasetEnsemble Class**: Orchestrates multi-source data integration
|
||||||
- **L2 Datasets**: Standardized XDGGS Xarray datasets per data source
|
- **L2 Datasets**: Standardized XDGGS Xarray datasets per data source
|
||||||
|
|
@ -98,7 +105,7 @@ The pipeline follows a sequential processing approach where each stage produces
|
||||||
- GPU-accelerated data loading (PyTorch/CuPy)
|
- GPU-accelerated data loading (PyTorch/CuPy)
|
||||||
- **Output**: Tabular feature matrices ready for scikit-learn API
|
- **Output**: Tabular feature matrices ready for scikit-learn API
|
||||||
|
|
||||||
#### 6. Model Training (`training.py`)
|
#### 6. Model Training (`ml/training.py`)
|
||||||
|
|
||||||
- **Supported Models**:
|
- **Supported Models**:
|
||||||
- **eSPA**: Entropy-optimal probabilistic classifier (primary)
|
- **eSPA**: Entropy-optimal probabilistic classifier (primary)
|
||||||
|
|
@ -112,7 +119,7 @@ The pipeline follows a sequential processing approach where each stage produces
|
||||||
- **Configuration**: TOML-based configuration with Cyclopts CLI
|
- **Configuration**: TOML-based configuration with Cyclopts CLI
|
||||||
- **Output**: Pickled models, CV results, feature importance
|
- **Output**: Pickled models, CV results, feature importance
|
||||||
|
|
||||||
#### 7. Inference (`inference.py`)
|
#### 7. Inference (`ml/inference.py`)
|
||||||
|
|
||||||
- Batch prediction pipeline for trained classifiers
|
- Batch prediction pipeline for trained classifiers
|
||||||
- GPU memory management with configurable batch sizes
|
- GPU memory management with configurable batch sizes
|
||||||
|
|
@ -170,7 +177,7 @@ scripts/05train.sh # Model training
|
||||||
|
|
||||||
- Dataclasses for typed configuration
|
- Dataclasses for typed configuration
|
||||||
- TOML files for training hyperparameters
|
- TOML files for training hyperparameters
|
||||||
- Environment-based path management (`paths.py`)
|
- Environment-based path management (`utils/paths.py`)
|
||||||
|
|
||||||
## Data Storage Hierarchy
|
## Data Storage Hierarchy
|
||||||
|
|
||||||
|
|
@ -209,9 +216,9 @@ DATA_DIR/
|
||||||
|
|
||||||
The architecture supports extension through:
|
The architecture supports extension through:
|
||||||
|
|
||||||
- **New Data Sources**: Implement feature extractor following ERA5/ArcticDEM patterns
|
- **New Data Sources**: Implement feature extractor in `ingest/` following ERA5/ArcticDEM patterns
|
||||||
- **Custom Aggregations**: Add methods to `_Aggregations` dataclass
|
- **Custom Aggregations**: Add methods to `_Aggregations` dataclass in `spatial/aggregators.py`
|
||||||
- **Alternative Targets**: Implement label extractor following DARTS pattern
|
- **Alternative Targets**: Implement label extractor in `ingest/` following DARTS pattern
|
||||||
- **Alternative Models**: Extend training CLI with new scikit-learn compatible estimators
|
- **Alternative Models**: Extend training CLI in `ml/training.py` with new scikit-learn compatible estimators
|
||||||
- **Dashboard Pages**: Add Streamlit pages to `dashboard/` module
|
- **Dashboard Pages**: Add Streamlit pages to `dashboard/` module
|
||||||
- **Grid Systems**: Support additional DGGS via xdggs integration
|
- **Grid Systems**: Support additional DGGS in `spatial/grids.py` via xdggs integration
|
||||||
|
|
|
||||||
|
|
@ -22,7 +22,10 @@ This will set up the complete environment including RAPIDS, PyTorch, and all geo
|
||||||
|
|
||||||
### Code Organization
|
### Code Organization
|
||||||
|
|
||||||
- **`src/entropice/`**: Core modules (grids, data sources, training, inference)
|
- **`src/entropice/ingest/`**: Data ingestion modules (darts, era5, arcticdem, alphaearth)
|
||||||
|
- **`src/entropice/spatial/`**: Spatial operations (grids, aggregators, watermask, xvec)
|
||||||
|
- **`src/entropice/ml/`**: Machine learning components (dataset, training, inference)
|
||||||
|
- **`src/entropice/utils/`**: Utilities (paths, codecs)
|
||||||
- **`src/entropice/dashboard/`**: Streamlit visualization dashboard
|
- **`src/entropice/dashboard/`**: Streamlit visualization dashboard
|
||||||
- **`scripts/`**: Data processing pipeline scripts (numbered 00-05)
|
- **`scripts/`**: Data processing pipeline scripts (numbered 00-05)
|
||||||
- **`notebooks/`**: Exploratory analysis and validation notebooks
|
- **`notebooks/`**: Exploratory analysis and validation notebooks
|
||||||
|
|
@ -30,11 +33,12 @@ This will set up the complete environment including RAPIDS, PyTorch, and all geo
|
||||||
|
|
||||||
### Key Modules
|
### Key Modules
|
||||||
|
|
||||||
- `grids.py`: H3/HEALPix spatial grid systems
|
- `spatial/grids.py`: H3/HEALPix spatial grid systems
|
||||||
- `darts.py`, `era5.py`, `arcticdem.py`, `alphaearth.py`: Data source processors
|
- `ingest/darts.py`, `ingest/era5.py`, `ingest/arcticdem.py`, `ingest/alphaearth.py`: Data source processors
|
||||||
- `dataset.py`: Dataset assembly and feature engineering
|
- `ml/dataset.py`: Dataset assembly and feature engineering
|
||||||
- `training.py`: Model training with eSPA, XGBoost, Random Forest, KNN
|
- `ml/training.py`: Model training with eSPA, XGBoost, Random Forest, KNN
|
||||||
- `inference.py`: Prediction generation
|
- `ml/inference.py`: Prediction generation
|
||||||
|
- `utils/paths.py`: Centralized path management
|
||||||
|
|
||||||
## Coding Standards
|
## Coding Standards
|
||||||
|
|
||||||
|
|
@ -58,7 +62,7 @@ This will set up the complete environment including RAPIDS, PyTorch, and all geo
|
||||||
- Follow the numbered script sequence: `00grids.sh` → `01darts.sh` → ... → `05train.sh`
|
- Follow the numbered script sequence: `00grids.sh` → `01darts.sh` → ... → `05train.sh`
|
||||||
- Each stage should produce reproducible intermediate outputs
|
- Each stage should produce reproducible intermediate outputs
|
||||||
- Document data dependencies in module docstrings
|
- Document data dependencies in module docstrings
|
||||||
- Use `paths.py` for consistent path management
|
- Use `utils/paths.py` for consistent path management
|
||||||
|
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -70,12 +70,12 @@ dependencies = [
|
||||||
]
|
]
|
||||||
|
|
||||||
[project.scripts]
|
[project.scripts]
|
||||||
create-grid = "entropice.grids:main"
|
create-grid = "entropice.spatial.grids:main"
|
||||||
darts = "entropice.darts:cli"
|
darts = "entropice.ingest.darts:cli"
|
||||||
alpha-earth = "entropice.alphaearth:main"
|
alpha-earth = "entropice.ingest.alphaearth:main"
|
||||||
era5 = "entropice.era5:cli"
|
era5 = "entropice.ingest.era5:cli"
|
||||||
arcticdem = "entropice.arcticdem:cli"
|
arcticdem = "entropice.ingest.arcticdem:cli"
|
||||||
train = "entropice.training:cli"
|
train = "entropice.ml.training:cli"
|
||||||
|
|
||||||
[build-system]
|
[build-system]
|
||||||
requires = ["hatchling"]
|
requires = ["hatchling"]
|
||||||
|
|
|
||||||
14
src/entropice/ingest/__init__.py
Normal file
14
src/entropice/ingest/__init__.py
Normal file
|
|
@ -0,0 +1,14 @@
|
||||||
|
"""Data ingestion modules for external data sources.
|
||||||
|
|
||||||
|
This package contains modules for processing and ingesting data from various
|
||||||
|
external sources into the Entropice system:
|
||||||
|
|
||||||
|
- darts: Retrogressive Thaw Slump (RTS) labels from DARTS v2 dataset
|
||||||
|
- era5: Climate data from ERA5 reanalysis
|
||||||
|
- arcticdem: Terrain data from ArcticDEM
|
||||||
|
- alphaearth: Satellite image embeddings from AlphaEarth/Google Earth Engine
|
||||||
|
"""
|
||||||
|
|
||||||
|
from . import alphaearth, arcticdem, darts, era5
|
||||||
|
|
||||||
|
__all__ = ["alphaearth", "arcticdem", "darts", "era5"]
|
||||||
12
src/entropice/ml/__init__.py
Normal file
12
src/entropice/ml/__init__.py
Normal file
|
|
@ -0,0 +1,12 @@
|
||||||
|
"""Machine learning components for model training and inference.
|
||||||
|
|
||||||
|
This package contains modules for machine learning workflows:
|
||||||
|
|
||||||
|
- dataset: Multi-source dataset assembly and feature engineering
|
||||||
|
- training: Model training with eSPA, XGBoost, Random Forest, KNN
|
||||||
|
- inference: Batch prediction pipeline for trained classifiers
|
||||||
|
"""
|
||||||
|
|
||||||
|
from . import dataset, inference, training
|
||||||
|
|
||||||
|
__all__ = ["dataset", "inference", "training"]
|
||||||
13
src/entropice/spatial/__init__.py
Normal file
13
src/entropice/spatial/__init__.py
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
"""Spatial operations and grid management modules.
|
||||||
|
|
||||||
|
This package contains modules for spatial data processing and grid-based operations:
|
||||||
|
|
||||||
|
- grids: Discrete global grid system (H3/HEALPix) generation and management
|
||||||
|
- aggregators: Raster-to-vector spatial aggregation framework
|
||||||
|
- watermask: Ocean masking utilities
|
||||||
|
- xvec: Extended vector operations for xarray
|
||||||
|
"""
|
||||||
|
|
||||||
|
from . import aggregators, grids, watermask, xvec
|
||||||
|
|
||||||
|
__all__ = ["aggregators", "grids", "watermask", "xvec"]
|
||||||
11
src/entropice/utils/__init__.py
Normal file
11
src/entropice/utils/__init__.py
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
"""Utility modules for common functionality.
|
||||||
|
|
||||||
|
This package contains utility modules used across the Entropice system:
|
||||||
|
|
||||||
|
- paths: Centralized path management and configuration
|
||||||
|
- codecs: Custom codecs for data serialization
|
||||||
|
"""
|
||||||
|
|
||||||
|
from . import codecs, paths
|
||||||
|
|
||||||
|
__all__ = ["codecs", "paths"]
|
||||||
|
|
@ -1,146 +0,0 @@
|
||||||
"""Test script to verify feature extraction works correctly."""
|
|
||||||
|
|
||||||
import numpy as np
|
|
||||||
import xarray as xr
|
|
||||||
|
|
||||||
# Create a mock model state with various feature types
|
|
||||||
features = [
|
|
||||||
# Embedding features: embedding_{agg}_{band}_{year}
|
|
||||||
"embedding_mean_B02_2020",
|
|
||||||
"embedding_std_B03_2021",
|
|
||||||
"embedding_max_B04_2022",
|
|
||||||
# ERA5 features without aggregations: era5_{variable}_{time}
|
|
||||||
"era5_temperature_2020_summer",
|
|
||||||
"era5_precipitation_2021_winter",
|
|
||||||
# ERA5 features with aggregations: era5_{variable}_{time}_{agg}
|
|
||||||
"era5_temperature_2020_summer_mean",
|
|
||||||
"era5_precipitation_2021_winter_std",
|
|
||||||
# ArcticDEM features: arcticdem_{variable}_{agg}
|
|
||||||
"arcticdem_elevation_mean",
|
|
||||||
"arcticdem_slope_std",
|
|
||||||
"arcticdem_aspect_max",
|
|
||||||
# Common features
|
|
||||||
"cell_area",
|
|
||||||
"water_area",
|
|
||||||
"land_area",
|
|
||||||
"land_ratio",
|
|
||||||
"lon",
|
|
||||||
"lat",
|
|
||||||
]
|
|
||||||
|
|
||||||
# Create mock importance values
|
|
||||||
importance_values = np.random.rand(len(features))
|
|
||||||
|
|
||||||
# Create a mock model state for ESPA
|
|
||||||
model_state_espa = xr.Dataset(
|
|
||||||
{
|
|
||||||
"feature_weights": xr.DataArray(
|
|
||||||
importance_values,
|
|
||||||
dims=["feature"],
|
|
||||||
coords={"feature": features},
|
|
||||||
)
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create a mock model state for XGBoost
|
|
||||||
model_state_xgb = xr.Dataset(
|
|
||||||
{
|
|
||||||
"feature_importance_gain": xr.DataArray(
|
|
||||||
importance_values,
|
|
||||||
dims=["feature"],
|
|
||||||
coords={"feature": features},
|
|
||||||
),
|
|
||||||
"feature_importance_weight": xr.DataArray(
|
|
||||||
importance_values * 0.8,
|
|
||||||
dims=["feature"],
|
|
||||||
coords={"feature": features},
|
|
||||||
),
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create a mock model state for Random Forest
|
|
||||||
model_state_rf = xr.Dataset(
|
|
||||||
{
|
|
||||||
"feature_importance": xr.DataArray(
|
|
||||||
importance_values,
|
|
||||||
dims=["feature"],
|
|
||||||
coords={"feature": features},
|
|
||||||
)
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
# Test extraction functions
|
|
||||||
from entropice.dashboard.utils.data import (
|
|
||||||
extract_arcticdem_features,
|
|
||||||
extract_common_features,
|
|
||||||
extract_embedding_features,
|
|
||||||
extract_era5_features,
|
|
||||||
)
|
|
||||||
|
|
||||||
print("=" * 80)
|
|
||||||
print("Testing ESPA model state")
|
|
||||||
print("=" * 80)
|
|
||||||
|
|
||||||
embedding_array = extract_embedding_features(model_state_espa)
|
|
||||||
print(f"\nEmbedding features extracted: {embedding_array is not None}")
|
|
||||||
if embedding_array is not None:
|
|
||||||
print(f" Dimensions: {embedding_array.dims}")
|
|
||||||
print(f" Shape: {embedding_array.shape}")
|
|
||||||
print(f" Coordinates: {list(embedding_array.coords)}")
|
|
||||||
|
|
||||||
era5_array = extract_era5_features(model_state_espa)
|
|
||||||
print(f"\nERA5 features extracted: {era5_array is not None}")
|
|
||||||
if era5_array is not None:
|
|
||||||
print(f" Dimensions: {era5_array.dims}")
|
|
||||||
print(f" Shape: {era5_array.shape}")
|
|
||||||
print(f" Coordinates: {list(era5_array.coords)}")
|
|
||||||
|
|
||||||
arcticdem_array = extract_arcticdem_features(model_state_espa)
|
|
||||||
print(f"\nArcticDEM features extracted: {arcticdem_array is not None}")
|
|
||||||
if arcticdem_array is not None:
|
|
||||||
print(f" Dimensions: {arcticdem_array.dims}")
|
|
||||||
print(f" Shape: {arcticdem_array.shape}")
|
|
||||||
print(f" Coordinates: {list(arcticdem_array.coords)}")
|
|
||||||
|
|
||||||
common_array = extract_common_features(model_state_espa)
|
|
||||||
print(f"\nCommon features extracted: {common_array is not None}")
|
|
||||||
if common_array is not None:
|
|
||||||
print(f" Dimensions: {common_array.dims}")
|
|
||||||
print(f" Shape: {common_array.shape}")
|
|
||||||
print(f" Size: {common_array.size}")
|
|
||||||
|
|
||||||
print("\n" + "=" * 80)
|
|
||||||
print("Testing XGBoost model state")
|
|
||||||
print("=" * 80)
|
|
||||||
|
|
||||||
embedding_array_xgb = extract_embedding_features(model_state_xgb, importance_type="feature_importance_gain")
|
|
||||||
print(f"\nEmbedding features (gain) extracted: {embedding_array_xgb is not None}")
|
|
||||||
if embedding_array_xgb is not None:
|
|
||||||
print(f" Dimensions: {embedding_array_xgb.dims}")
|
|
||||||
print(f" Shape: {embedding_array_xgb.shape}")
|
|
||||||
|
|
||||||
era5_array_xgb = extract_era5_features(model_state_xgb, importance_type="feature_importance_weight")
|
|
||||||
print(f"\nERA5 features (weight) extracted: {era5_array_xgb is not None}")
|
|
||||||
if era5_array_xgb is not None:
|
|
||||||
print(f" Dimensions: {era5_array_xgb.dims}")
|
|
||||||
print(f" Shape: {era5_array_xgb.shape}")
|
|
||||||
|
|
||||||
print("\n" + "=" * 80)
|
|
||||||
print("Testing Random Forest model state")
|
|
||||||
print("=" * 80)
|
|
||||||
|
|
||||||
embedding_array_rf = extract_embedding_features(model_state_rf, importance_type="feature_importance")
|
|
||||||
print(f"\nEmbedding features extracted: {embedding_array_rf is not None}")
|
|
||||||
if embedding_array_rf is not None:
|
|
||||||
print(f" Dimensions: {embedding_array_rf.dims}")
|
|
||||||
print(f" Shape: {embedding_array_rf.shape}")
|
|
||||||
|
|
||||||
arcticdem_array_rf = extract_arcticdem_features(model_state_rf, importance_type="feature_importance")
|
|
||||||
print(f"\nArcticDEM features extracted: {arcticdem_array_rf is not None}")
|
|
||||||
if arcticdem_array_rf is not None:
|
|
||||||
print(f" Dimensions: {arcticdem_array_rf.dims}")
|
|
||||||
print(f" Shape: {arcticdem_array_rf.shape}")
|
|
||||||
|
|
||||||
print("\n" + "=" * 80)
|
|
||||||
print("All tests completed successfully!")
|
|
||||||
print("=" * 80)
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue