entropice/CONTRIBUTING.md

3.3 KiB

Contributing to Entropice

Thank you for your interest in contributing to Entropice! This document provides guidelines for contributing to the project.

Getting Started

Prerequisites

Setup

pixi install

This will set up the complete environment including RAPIDS, PyTorch, and all geospatial dependencies.

Development Workflow

Code Organization

  • src/entropice/: Core modules (grids, data sources, training, inference)
  • src/entropice/dashboard/: Streamlit visualization dashboard
  • scripts/: Data processing pipeline scripts (numbered 00-05)
  • notebooks/: Exploratory analysis and validation notebooks
  • tests/: Unit tests

Key Modules

  • grids.py: H3/HEALPix spatial grid systems
  • darts.py, era5.py, arcticdem.py, alphaearth.py: Data source processors
  • dataset.py: Dataset assembly and feature engineering
  • training.py: Model training with eSPA/SPARTAn
  • inference.py: Prediction generation

Coding Standards

Python Style

  • Follow PEP 8 conventions
  • Use type hints for function signatures
  • Prefer numpy-style docstrings for public functions
  • Keep functions focused and modular

Geospatial Best Practices

  • Use xarray with XDGGS for gridded data storage
  • Store intermediate results as Parquet (tabular) or Zarr (arrays)
  • Leverage Dask for lazy evaluation of large datasets
  • Use GeoPandas for vector operations
  • Use EPSG:3413 (Arctic Stereographic) coordinate reference system (CRS) for any computation on the data and EPSG:4326 (WGS84) for data visualization and compatability with some libraries

Data Pipeline

  • Follow the numbered script sequence: 00grids.sh01darts.sh → ... → 05train.sh
  • Each stage should produce reproducible intermediate outputs
  • Document data dependencies in module docstrings
  • Use paths.py for consistent path management

Testing

Run tests for specific modules:

pixi run pytest

When adding features, include tests that verify:

  • Correct handling of geospatial coordinates and projections
  • Proper aggregation to grid cells
  • Data integrity through pipeline stages

Submitting Changes

Pull Request Process

  1. Branch: Create a feature branch from main
  2. Commit: Write clear, descriptive commit messages
  3. Test: Verify your changes don't break existing functionality
  4. Document: Update relevant docstrings and documentation
  5. PR: Submit a pull request with a clear description of changes

Commit Messages

  • Use present tense: "Add feature" not "Added feature"
  • Reference issues when applicable: "Fix #123: Correct grid aggregation"
  • Keep first line under 72 characters

Working with Data

Local Development

  • Set FAST_DATA_DIR environment variable for data directory (default: ./data)

Notebooks

  • Notebooks in notebooks/ are for exploration and validation, they are not commited to git
  • Keep production code in src/entropice/

Dashboard Development

Run the dashboard locally:

pixi run dashboard

Dashboard code is in src/entropice/dashboard/ with modular pages and plotting utilities.

Questions?

For questions about the architecture, see ARCHITECTURE.md. For scientific background, see README.md.