116 lines
3.3 KiB
Markdown
116 lines
3.3 KiB
Markdown
# Contributing to Entropice
|
|
|
|
Thank you for your interest in contributing to Entropice! This document provides guidelines for contributing to the project.
|
|
|
|
## Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
- Python 3.13
|
|
- CUDA 12 compatible GPU (for full functionality)
|
|
- [Pixi package manager](https://pixi.sh/)
|
|
|
|
### Setup
|
|
|
|
```bash
|
|
pixi install
|
|
```
|
|
|
|
This will set up the complete environment including RAPIDS, PyTorch, and all geospatial dependencies.
|
|
|
|
## Development Workflow
|
|
|
|
### Code Organization
|
|
|
|
- **`src/entropice/`**: Core modules (grids, data sources, training, inference)
|
|
- **`src/entropice/dashboard/`**: Streamlit visualization dashboard
|
|
- **`scripts/`**: Data processing pipeline scripts (numbered 00-05)
|
|
- **`notebooks/`**: Exploratory analysis and validation notebooks
|
|
- **`tests/`**: Unit tests
|
|
|
|
### Key Modules
|
|
|
|
- `grids.py`: H3/HEALPix spatial grid systems
|
|
- `darts.py`, `era5.py`, `arcticdem.py`, `alphaearth.py`: Data source processors
|
|
- `dataset.py`: Dataset assembly and feature engineering
|
|
- `training.py`: Model training with eSPA/SPARTAn
|
|
- `inference.py`: Prediction generation
|
|
|
|
## Coding Standards
|
|
|
|
### Python Style
|
|
|
|
- Follow PEP 8 conventions
|
|
- Use type hints for function signatures
|
|
- Prefer numpy-style docstrings for public functions
|
|
- Keep functions focused and modular
|
|
|
|
### Geospatial Best Practices
|
|
|
|
- Use **xarray** with XDGGS for gridded data storage
|
|
- Store intermediate results as **Parquet** (tabular) or **Zarr** (arrays)
|
|
- Leverage **Dask** for lazy evaluation of large datasets
|
|
- Use **GeoPandas** for vector operations
|
|
- Use EPSG:3413 (Arctic Stereographic) coordinate reference system (CRS) for any computation on the data and EPSG:4326 (WGS84) for data visualization and compatability with some libraries
|
|
|
|
### Data Pipeline
|
|
|
|
- Follow the numbered script sequence: `00grids.sh` → `01darts.sh` → ... → `05train.sh`
|
|
- Each stage should produce reproducible intermediate outputs
|
|
- Document data dependencies in module docstrings
|
|
- Use `paths.py` for consistent path management
|
|
|
|
## Testing
|
|
|
|
Run tests for specific modules:
|
|
|
|
```bash
|
|
pixi run pytest
|
|
```
|
|
|
|
When adding features, include tests that verify:
|
|
|
|
- Correct handling of geospatial coordinates and projections
|
|
- Proper aggregation to grid cells
|
|
- Data integrity through pipeline stages
|
|
|
|
## Submitting Changes
|
|
|
|
### Pull Request Process
|
|
|
|
1. **Branch**: Create a feature branch from `main`
|
|
2. **Commit**: Write clear, descriptive commit messages
|
|
3. **Test**: Verify your changes don't break existing functionality
|
|
4. **Document**: Update relevant docstrings and documentation
|
|
5. **PR**: Submit a pull request with a clear description of changes
|
|
|
|
### Commit Messages
|
|
|
|
- Use present tense: "Add feature" not "Added feature"
|
|
- Reference issues when applicable: "Fix #123: Correct grid aggregation"
|
|
- Keep first line under 72 characters
|
|
|
|
## Working with Data
|
|
|
|
### Local Development
|
|
|
|
- Set `FAST_DATA_DIR` environment variable for data directory (default: `./data`)
|
|
|
|
### Notebooks
|
|
|
|
- Notebooks in `notebooks/` are for exploration and validation, they are not commited to git
|
|
- Keep production code in `src/entropice/`
|
|
|
|
## Dashboard Development
|
|
|
|
Run the dashboard locally:
|
|
|
|
```bash
|
|
pixi run dashboard
|
|
```
|
|
|
|
Dashboard code is in `src/entropice/dashboard/` with modular pages and plotting utilities.
|
|
|
|
## Questions?
|
|
|
|
For questions about the architecture, see `ARCHITECTURE.md`. For scientific background, see `README.md`.
|