Add some docs for copilot

This commit is contained in:
Tobias Hölzer 2025-12-28 20:11:11 +01:00
parent 1ee3d532fc
commit f8df10f687
9 changed files with 908 additions and 2027 deletions

116
CONTRIBUTING.md Normal file
View file

@ -0,0 +1,116 @@
# Contributing to Entropice
Thank you for your interest in contributing to Entropice! This document provides guidelines for contributing to the project.
## Getting Started
### Prerequisites
- Python 3.13
- CUDA 12 compatible GPU (for full functionality)
- [Pixi package manager](https://pixi.sh/)
### Setup
```bash
pixi install
```
This will set up the complete environment including RAPIDS, PyTorch, and all geospatial dependencies.
## Development Workflow
### Code Organization
- **`src/entropice/`**: Core modules (grids, data sources, training, inference)
- **`src/entropice/dashboard/`**: Streamlit visualization dashboard
- **`scripts/`**: Data processing pipeline scripts (numbered 00-05)
- **`notebooks/`**: Exploratory analysis and validation notebooks
- **`tests/`**: Unit tests
### Key Modules
- `grids.py`: H3/HEALPix spatial grid systems
- `darts.py`, `era5.py`, `arcticdem.py`, `alphaearth.py`: Data source processors
- `dataset.py`: Dataset assembly and feature engineering
- `training.py`: Model training with eSPA/SPARTAn
- `inference.py`: Prediction generation
## Coding Standards
### Python Style
- Follow PEP 8 conventions
- Use type hints for function signatures
- Prefer numpy-style docstrings for public functions
- Keep functions focused and modular
### Geospatial Best Practices
- Use **xarray** with XDGGS for gridded data storage
- Store intermediate results as **Parquet** (tabular) or **Zarr** (arrays)
- Leverage **Dask** for lazy evaluation of large datasets
- Use **GeoPandas** for vector operations
- Use EPSG:3413 (Arctic Stereographic) coordinate reference system (CRS) for any computation on the data and EPSG:4326 (WGS84) for data visualization and compatability with some libraries
### Data Pipeline
- Follow the numbered script sequence: `00grids.sh``01darts.sh` → ... → `05train.sh`
- Each stage should produce reproducible intermediate outputs
- Document data dependencies in module docstrings
- Use `paths.py` for consistent path management
## Testing
Run tests for specific modules:
```bash
pixi run pytest
```
When adding features, include tests that verify:
- Correct handling of geospatial coordinates and projections
- Proper aggregation to grid cells
- Data integrity through pipeline stages
## Submitting Changes
### Pull Request Process
1. **Branch**: Create a feature branch from `main`
2. **Commit**: Write clear, descriptive commit messages
3. **Test**: Verify your changes don't break existing functionality
4. **Document**: Update relevant docstrings and documentation
5. **PR**: Submit a pull request with a clear description of changes
### Commit Messages
- Use present tense: "Add feature" not "Added feature"
- Reference issues when applicable: "Fix #123: Correct grid aggregation"
- Keep first line under 72 characters
## Working with Data
### Local Development
- Set `FAST_DATA_DIR` environment variable for data directory (default: `./data`)
### Notebooks
- Notebooks in `notebooks/` are for exploration and validation, they are not commited to git
- Keep production code in `src/entropice/`
## Dashboard Development
Run the dashboard locally:
```bash
pixi run dashboard
```
Dashboard code is in `src/entropice/dashboard/` with modular pages and plotting utilities.
## Questions?
For questions about the architecture, see `ARCHITECTURE.md`. For scientific background, see `README.md`.