Refactor non-dashboard modules
This commit is contained in:
parent
907e907856
commit
45bc61e49e
22 changed files with 122 additions and 186 deletions
45
.github/copilot-instructions.md
vendored
45
.github/copilot-instructions.md
vendored
|
|
@ -42,7 +42,7 @@ For project goals and setup, see [README.md](../README.md).
|
|||
|
||||
- Follow the numbered script sequence: `00grids.sh` → `01darts.sh` → `02alphaearth.sh` → `03era5.sh` → `04arcticdem.sh` → `05train.sh`
|
||||
- Each pipeline stage should produce **reproducible intermediate outputs**
|
||||
- Use `src/entropice/paths.py` for consistent path management
|
||||
- Use `src/entropice/utils/paths.py` for consistent path management
|
||||
- Environment variable `FAST_DATA_DIR` controls data directory location (default: `./data`)
|
||||
|
||||
### Storage Hierarchy
|
||||
|
|
@ -64,16 +64,37 @@ DATA_DIR/
|
|||
|
||||
### Core Modules (`src/entropice/`)
|
||||
|
||||
- **`grids.py`**: H3/HEALPix spatial grid generation with watermask
|
||||
The codebase is organized into four main packages:
|
||||
|
||||
- **`entropice.ingest`**: Data ingestion from external sources
|
||||
- **`entropice.spatial`**: Spatial operations and grid management
|
||||
- **`entropice.ml`**: Machine learning workflows
|
||||
- **`entropice.utils`**: Common utilities
|
||||
|
||||
#### Data Ingestion (`src/entropice/ingest/`)
|
||||
|
||||
- **`darts.py`**: RTS label extraction from DARTS v2 dataset
|
||||
- **`era5.py`**: Climate data processing from ERA5 (Arctic-aligned years: Oct 1 - Sep 30)
|
||||
- **`arcticdem.py`**: Terrain analysis from 32m Arctic elevation data
|
||||
- **`alphaearth.py`**: Satellite image embeddings via Google Earth Engine
|
||||
|
||||
#### Spatial Operations (`src/entropice/spatial/`)
|
||||
|
||||
- **`grids.py`**: H3/HEALPix spatial grid generation with watermask
|
||||
- **`aggregators.py`**: Raster-to-vector spatial aggregation engine
|
||||
- **`watermask.py`**: Ocean masking utilities
|
||||
- **`xvec.py`**: Extended vector operations for xarray
|
||||
|
||||
#### Machine Learning (`src/entropice/ml/`)
|
||||
|
||||
- **`dataset.py`**: Multi-source data integration and feature engineering
|
||||
- **`training.py`**: Model training with eSPA, XGBoost, Random Forest, KNN
|
||||
- **`inference.py`**: Batch prediction pipeline for trained models
|
||||
|
||||
#### Utilities (`src/entropice/utils/`)
|
||||
|
||||
- **`paths.py`**: Centralized path management
|
||||
- **`codecs.py`**: Custom codecs for data serialization
|
||||
|
||||
### Dashboard (`src/entropice/dashboard/`)
|
||||
|
||||
|
|
@ -110,10 +131,10 @@ pixi run pytest
|
|||
|
||||
### Common Tasks
|
||||
|
||||
- **Generate grids**: Use `grids.py` CLI
|
||||
- **Process labels**: Use `darts.py` CLI
|
||||
- **Train models**: Use `training.py` CLI with TOML config
|
||||
- **Run inference**: Use `inference.py` CLI
|
||||
- **Generate grids**: Use `spatial/grids.py` CLI
|
||||
- **Process labels**: Use `ingest/darts.py` CLI
|
||||
- **Train models**: Use `ml/training.py` CLI with TOML config
|
||||
- **Run inference**: Use `ml/inference.py` CLI
|
||||
- **View results**: `pixi run dashboard`
|
||||
|
||||
## Key Design Patterns
|
||||
|
|
@ -162,10 +183,10 @@ Training features:
|
|||
|
||||
To extend Entropice:
|
||||
|
||||
- **New data source**: Follow patterns in `era5.py` or `arcticdem.py`
|
||||
- **Custom aggregations**: Add to `_Aggregations` dataclass in `aggregators.py`
|
||||
- **Alternative labels**: Implement extractor following `darts.py` pattern
|
||||
- **New models**: Add scikit-learn compatible estimators to `training.py`
|
||||
- **New data source**: Follow patterns in `ingest/era5.py` or `ingest/arcticdem.py`
|
||||
- **Custom aggregations**: Add to `_Aggregations` dataclass in `spatial/aggregators.py`
|
||||
- **Alternative labels**: Implement extractor following `ingest/darts.py` pattern
|
||||
- **New models**: Add scikit-learn compatible estimators to `ml/training.py`
|
||||
- **Dashboard pages**: Add Streamlit pages to `dashboard/` module
|
||||
|
||||
## Important Notes
|
||||
|
|
@ -175,13 +196,13 @@ To extend Entropice:
|
|||
- Handle **antimeridian crossing** in polar regions
|
||||
- Use **batch processing** for GPU memory management
|
||||
- Notebooks are for exploration only - **keep production code in `src/`**
|
||||
- Always use **absolute paths** or paths from `paths.py`
|
||||
- Always use **absolute paths** or paths from `utils/paths.py`
|
||||
|
||||
## Common Issues
|
||||
|
||||
- **Memory**: Use batch processing and Dask chunking for large datasets
|
||||
- **GPU OOM**: Reduce batch size in inference or training
|
||||
- **Antimeridian**: Use proper handling in `aggregators.py` for polar grids
|
||||
- **Antimeridian**: Use proper handling in `spatial/aggregators.py` for polar grids
|
||||
- **Temporal alignment**: ERA5 uses Arctic-aligned years (Oct-Sep)
|
||||
- **CRS**: Compute in EPSG:3413, visualize in EPSG:4326
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue