Update docs, instructions and format code
This commit is contained in:
parent
fca232da91
commit
4260b492ab
29 changed files with 987 additions and 467 deletions
|
|
@ -6,7 +6,6 @@ Thank you for your interest in contributing to Entropice! This document provides
|
|||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.13
|
||||
- CUDA 12 compatible GPU (for full functionality)
|
||||
- [Pixi package manager](https://pixi.sh/)
|
||||
|
||||
|
|
@ -16,55 +15,36 @@ Thank you for your interest in contributing to Entropice! This document provides
|
|||
pixi install
|
||||
```
|
||||
|
||||
This will set up the complete environment including RAPIDS, PyTorch, and all geospatial dependencies.
|
||||
This will set up the complete environment including Python, RAPIDS, PyTorch, and all geospatial dependencies.
|
||||
|
||||
## Development Workflow
|
||||
|
||||
**Important**: Always use `pixi run` to execute Python commands and scripts to ensure you're using the correct environment with all dependencies.
|
||||
> Read in the [Architecture Guide](ARCHITECTURE.md) about the code organisatoin and key modules
|
||||
|
||||
### Code Organization
|
||||
**Important**: Always use `pixi run` to execute Python commands and scripts to ensure you're using the correct environment with all dependencies:
|
||||
|
||||
- **`src/entropice/ingest/`**: Data ingestion modules (darts, era5, arcticdem, alphaearth)
|
||||
- **`src/entropice/spatial/`**: Spatial operations (grids, aggregators, watermask, xvec)
|
||||
- **`src/entropice/ml/`**: Machine learning components (dataset, training, inference)
|
||||
- **`src/entropice/utils/`**: Utilities (paths, codecs)
|
||||
- **`src/entropice/dashboard/`**: Streamlit visualization dashboard
|
||||
- **`scripts/`**: Data processing pipeline scripts (numbered 00-05)
|
||||
- **`notebooks/`**: Exploratory analysis and validation notebooks
|
||||
- **`tests/`**: Unit tests
|
||||
```bash
|
||||
pixi run python script.py
|
||||
pixi run python -c "import entropice"
|
||||
```
|
||||
|
||||
### Key Modules
|
||||
|
||||
- `spatial/grids.py`: H3/HEALPix spatial grid systems
|
||||
- `ingest/darts.py`, `ingest/era5.py`, `ingest/arcticdem.py`, `ingest/alphaearth.py`: Data source processors
|
||||
- `ml/dataset.py`: Dataset assembly and feature engineering
|
||||
- `ml/training.py`: Model training with eSPA, XGBoost, Random Forest, KNN
|
||||
- `ml/inference.py`: Prediction generation
|
||||
- `utils/paths.py`: Centralized path management
|
||||
|
||||
## Coding Standards
|
||||
|
||||
### Python Style
|
||||
### Python Style and Formatting
|
||||
|
||||
- Follow PEP 8 conventions
|
||||
- Use type hints for function signatures
|
||||
- Prefer numpy-style docstrings for public functions
|
||||
- Prefer google-style docstrings for public functions
|
||||
- Keep functions focused and modular
|
||||
|
||||
### Geospatial Best Practices
|
||||
`ty` and `ruff` are used for typing, linting and formatting.
|
||||
Ensure to check for any warnings from both of these:
|
||||
|
||||
- Use **xarray** with XDGGS for gridded data storage
|
||||
- Store intermediate results as **Parquet** (tabular) or **Zarr** (arrays)
|
||||
- Leverage **Dask** for lazy evaluation of large datasets
|
||||
- Use **GeoPandas** for vector operations
|
||||
- Use EPSG:3413 (Arctic Stereographic) coordinate reference system (CRS) for any computation on the data and EPSG:4326 (WGS84) for data visualization and compatability with some libraries
|
||||
```sh
|
||||
pixi run ty check # For type checks
|
||||
pixi run ruff check # For linting
|
||||
pixi run ruff format # For formatting
|
||||
```
|
||||
|
||||
### Data Pipeline
|
||||
|
||||
- Follow the numbered script sequence: `00grids.sh` → `01darts.sh` → ... → `05train.sh`
|
||||
- Each stage should produce reproducible intermediate outputs
|
||||
- Document data dependencies in module docstrings
|
||||
- Use `utils/paths.py` for consistent path management
|
||||
Single files can be specified by just adding them to the command, e.g. `pixi run ty check src/entropice/dashboard/app.py`
|
||||
|
||||
## Testing
|
||||
|
||||
|
|
@ -74,13 +54,6 @@ Run tests for specific modules:
|
|||
pixi run pytest
|
||||
```
|
||||
|
||||
When running Python scripts or commands, always use `pixi run`:
|
||||
|
||||
```bash
|
||||
pixi run python script.py
|
||||
pixi run python -c "import entropice"
|
||||
```
|
||||
|
||||
When adding features, include tests that verify:
|
||||
|
||||
- Correct handling of geospatial coordinates and projections
|
||||
|
|
@ -100,15 +73,8 @@ When adding features, include tests that verify:
|
|||
### Commit Messages
|
||||
|
||||
- Use present tense: "Add feature" not "Added feature"
|
||||
- Reference issues when applicable: "Fix #123: Correct grid aggregation"
|
||||
- Keep first line under 72 characters
|
||||
|
||||
## Working with Data
|
||||
|
||||
### Local Development
|
||||
|
||||
- Set `FAST_DATA_DIR` environment variable for data directory (default: `./data`)
|
||||
|
||||
### Notebooks
|
||||
|
||||
- Notebooks in `notebooks/` are for exploration and validation, they are not commited to git
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue