3.3 KiB
3.3 KiB
Contributing to Entropice
Thank you for your interest in contributing to Entropice! This document provides guidelines for contributing to the project.
Getting Started
Prerequisites
- Python 3.13
- CUDA 12 compatible GPU (for full functionality)
- Pixi package manager
Setup
pixi install
This will set up the complete environment including RAPIDS, PyTorch, and all geospatial dependencies.
Development Workflow
Code Organization
src/entropice/: Core modules (grids, data sources, training, inference)src/entropice/dashboard/: Streamlit visualization dashboardscripts/: Data processing pipeline scripts (numbered 00-05)notebooks/: Exploratory analysis and validation notebookstests/: Unit tests
Key Modules
grids.py: H3/HEALPix spatial grid systemsdarts.py,era5.py,arcticdem.py,alphaearth.py: Data source processorsdataset.py: Dataset assembly and feature engineeringtraining.py: Model training with eSPA, XGBoost, Random Forest, KNNinference.py: Prediction generation
Coding Standards
Python Style
- Follow PEP 8 conventions
- Use type hints for function signatures
- Prefer numpy-style docstrings for public functions
- Keep functions focused and modular
Geospatial Best Practices
- Use xarray with XDGGS for gridded data storage
- Store intermediate results as Parquet (tabular) or Zarr (arrays)
- Leverage Dask for lazy evaluation of large datasets
- Use GeoPandas for vector operations
- Use EPSG:3413 (Arctic Stereographic) coordinate reference system (CRS) for any computation on the data and EPSG:4326 (WGS84) for data visualization and compatability with some libraries
Data Pipeline
- Follow the numbered script sequence:
00grids.sh→01darts.sh→ ... →05train.sh - Each stage should produce reproducible intermediate outputs
- Document data dependencies in module docstrings
- Use
paths.pyfor consistent path management
Testing
Run tests for specific modules:
pixi run pytest
When adding features, include tests that verify:
- Correct handling of geospatial coordinates and projections
- Proper aggregation to grid cells
- Data integrity through pipeline stages
Submitting Changes
Pull Request Process
- Branch: Create a feature branch from
main - Commit: Write clear, descriptive commit messages
- Test: Verify your changes don't break existing functionality
- Document: Update relevant docstrings and documentation
- PR: Submit a pull request with a clear description of changes
Commit Messages
- Use present tense: "Add feature" not "Added feature"
- Reference issues when applicable: "Fix #123: Correct grid aggregation"
- Keep first line under 72 characters
Working with Data
Local Development
- Set
FAST_DATA_DIRenvironment variable for data directory (default:./data)
Notebooks
- Notebooks in
notebooks/are for exploration and validation, they are not commited to git - Keep production code in
src/entropice/
Dashboard Development
Run the dashboard locally:
pixi run dashboard
Dashboard code is in src/entropice/dashboard/ with modular pages and plotting utilities.
Questions?
For questions about the architecture, see ARCHITECTURE.md. For scientific background, see README.md.