# Contributing to Entropice Thank you for your interest in contributing to Entropice! This document provides guidelines for contributing to the project. ## Getting Started ### Prerequisites - Python 3.13 - CUDA 12 compatible GPU (for full functionality) - [Pixi package manager](https://pixi.sh/) ### Setup ```bash pixi install ``` This will set up the complete environment including RAPIDS, PyTorch, and all geospatial dependencies. ## Development Workflow ### Code Organization - **`src/entropice/`**: Core modules (grids, data sources, training, inference) - **`src/entropice/dashboard/`**: Streamlit visualization dashboard - **`scripts/`**: Data processing pipeline scripts (numbered 00-05) - **`notebooks/`**: Exploratory analysis and validation notebooks - **`tests/`**: Unit tests ### Key Modules - `grids.py`: H3/HEALPix spatial grid systems - `darts.py`, `era5.py`, `arcticdem.py`, `alphaearth.py`: Data source processors - `dataset.py`: Dataset assembly and feature engineering - `training.py`: Model training with eSPA, XGBoost, Random Forest, KNN - `inference.py`: Prediction generation ## Coding Standards ### Python Style - Follow PEP 8 conventions - Use type hints for function signatures - Prefer numpy-style docstrings for public functions - Keep functions focused and modular ### Geospatial Best Practices - Use **xarray** with XDGGS for gridded data storage - Store intermediate results as **Parquet** (tabular) or **Zarr** (arrays) - Leverage **Dask** for lazy evaluation of large datasets - Use **GeoPandas** for vector operations - Use EPSG:3413 (Arctic Stereographic) coordinate reference system (CRS) for any computation on the data and EPSG:4326 (WGS84) for data visualization and compatability with some libraries ### Data Pipeline - Follow the numbered script sequence: `00grids.sh` → `01darts.sh` → ... → `05train.sh` - Each stage should produce reproducible intermediate outputs - Document data dependencies in module docstrings - Use `paths.py` for consistent path management ## Testing Run tests for specific modules: ```bash pixi run pytest ``` When adding features, include tests that verify: - Correct handling of geospatial coordinates and projections - Proper aggregation to grid cells - Data integrity through pipeline stages ## Submitting Changes ### Pull Request Process 1. **Branch**: Create a feature branch from `main` 2. **Commit**: Write clear, descriptive commit messages 3. **Test**: Verify your changes don't break existing functionality 4. **Document**: Update relevant docstrings and documentation 5. **PR**: Submit a pull request with a clear description of changes ### Commit Messages - Use present tense: "Add feature" not "Added feature" - Reference issues when applicable: "Fix #123: Correct grid aggregation" - Keep first line under 72 characters ## Working with Data ### Local Development - Set `FAST_DATA_DIR` environment variable for data directory (default: `./data`) ### Notebooks - Notebooks in `notebooks/` are for exploration and validation, they are not commited to git - Keep production code in `src/entropice/` ## Dashboard Development Run the dashboard locally: ```bash pixi run dashboard ``` Dashboard code is in `src/entropice/dashboard/` with modular pages and plotting utilities. ## Questions? For questions about the architecture, see `ARCHITECTURE.md`. For scientific background, see `README.md`.