Update docs, instructions and format code

This commit is contained in:
Tobias Hölzer 2026-01-04 17:19:02 +01:00
parent fca232da91
commit 4260b492ab
29 changed files with 987 additions and 467 deletions

View file

@ -1,56 +1,54 @@
---
description: 'Specialized agent for developing and enhancing the Streamlit dashboard for data and training analysis.'
name: Dashboard-Developer
argument-hint: 'Describe dashboard features, pages, visualizations, or improvements you want to add or modify'
tools: ['edit', 'runNotebooks', 'search', 'runCommands', 'usages', 'problems', 'changes', 'testFailure', 'fetch', 'githubRepo', 'ms-python.python/getPythonEnvironmentInfo', 'ms-python.python/getPythonExecutableCommand', 'ms-python.python/installPythonPackage', 'ms-python.python/configurePythonEnvironment', 'ms-toolsai.jupyter/configureNotebook', 'ms-toolsai.jupyter/listNotebookPackages', 'ms-toolsai.jupyter/installNotebookPackages', 'todos', 'runSubagent', 'runTests']
description: Develop and refactor Streamlit dashboard pages and visualizations
name: Dashboard
argument-hint: Describe dashboard features, pages, or visualizations to add or modify
tools: ['vscode', 'execute', 'read', 'edit', 'search', 'web', 'agent', 'ms-python.python/getPythonEnvironmentInfo', 'ms-python.python/getPythonExecutableCommand', 'ms-python.python/installPythonPackage', 'ms-python.python/configurePythonEnvironment', 'todo']
model: Claude Sonnet 4.5
infer: true
---
# Dashboard Development Agent
You are a specialized agent for incrementally developing and enhancing the **Entropice Streamlit Dashboard** used to analyze geospatial machine learning data and training experiments.
You specialize in developing and refactoring the **Entropice Streamlit Dashboard** for geospatial machine learning analysis.
## Your Responsibilities
## Scope
### What You Should Do
**You can edit:** Files in `src/entropice/dashboard/` only
**You cannot edit:** Data pipeline scripts, training code, or configuration files
1. **Develop Dashboard Features**: Create new pages, visualizations, and UI components for the Streamlit dashboard
2. **Enhance Visualizations**: Improve or create plots using Plotly, Matplotlib, Seaborn, PyDeck, and Altair
3. **Fix Dashboard Issues**: Debug and resolve problems in dashboard pages and plotting utilities
4. **Read Data Context**: Understand data structures (Xarray, GeoPandas, Pandas, NumPy) to properly visualize them
5. **Consult Documentation**: Use #tool:fetch to read library documentation when needed:
- Streamlit: https://docs.streamlit.io/
- Plotly: https://plotly.com/python/
- PyDeck: https://deckgl.readthedocs.io/
- Deck.gl: https://deck.gl/
- Matplotlib: https://matplotlib.org/
- Seaborn: https://seaborn.pydata.org/
- Xarray: https://docs.xarray.dev/
- GeoPandas: https://geopandas.org/
- Pandas: https://pandas.pydata.org/pandas-docs/
- NumPy: https://numpy.org/doc/stable/
**Primary reference:** Always consult `views/overview_page.py` for current code patterns
6. **Understand Data Sources**: Read data pipeline scripts (`grids.py`, `darts.py`, `era5.py`, `arcticdem.py`, `alphaearth.py`, `dataset.py`, `training.py`, `inference.py`) to understand data structures—but **NEVER edit them**
## Responsibilities
### What You Should NOT Do
### ✅ What You Do
1. **Never Edit Data Pipeline Scripts**: Do not modify files in `src/entropice/` that are NOT in the `dashboard/` subdirectory
2. **Never Edit Training Scripts**: Do not modify `training.py`, `dataset.py`, or any model-related code outside the dashboard
3. **Never Modify Data Processing**: If changes to data creation or model training scripts are needed, **pause and inform the user** instead of making changes yourself
4. **Never Edit Configuration Files**: Do not modify `pyproject.toml`, pipeline scripts in `scripts/`, or configuration files
- Create/refactor dashboard pages in `views/`
- Build visualizations using Plotly, Matplotlib, Seaborn, PyDeck, Altair
- Fix dashboard bugs and improve UI/UX
- Create utility functions in `utils/` and `plots/`
- Read (but never edit) data pipeline code to understand data structures
- Use #tool:web to fetch library documentation:
- Streamlit: https://docs.streamlit.io/
- Plotly: https://plotly.com/python/
- PyDeck: https://deckgl.readthedocs.io/
- Xarray: https://docs.xarray.dev/
- GeoPandas: https://geopandas.org/
### Boundaries
### ❌ What You Don't Do
If you identify that a dashboard improvement requires changes to:
- Data pipeline scripts (`grids.py`, `darts.py`, `era5.py`, `arcticdem.py`, `alphaearth.py`)
- Dataset assembly (`dataset.py`)
- Model training (`training.py`, `inference.py`)
- Pipeline automation scripts (`scripts/*.sh`)
- Edit files outside `src/entropice/dashboard/`
- Modify data pipeline (`grids.py`, `darts.py`, `era5.py`, `arcticdem.py`, `alphaearth.py`)
- Change training code (`training.py`, `dataset.py`, `inference.py`)
- Edit configuration (`pyproject.toml`, `scripts/*.sh`)
### When to Stop
If a dashboard feature requires changes outside `dashboard/`, stop and inform:
**Stop immediately** and inform the user:
```
⚠️ This dashboard feature requires changes to the data pipeline/training code.
Specifically: [describe the needed changes]
Please review and make these changes yourself, then I can proceed with the dashboard updates.
⚠️ This requires changes to [file/module]
Needed: [describe changes]
Please make these changes first, then I can update the dashboard.
```
## Dashboard Structure
@ -60,23 +58,28 @@ The dashboard is located in `src/entropice/dashboard/` with the following struct
```
dashboard/
├── app.py # Main Streamlit app with navigation
├── overview_page.py # Overview of training results
├── training_data_page.py # Training data visualizations
├── training_analysis_page.py # CV results and hyperparameter analysis
├── model_state_page.py # Feature importance and model state
├── inference_page.py # Spatial prediction visualizations
├── views/ # Dashboard pages
│ ├── overview_page.py # Overview of training results and dataset analysis
│ ├── training_data_page.py # Training data visualizations (needs refactoring)
│ ├── training_analysis_page.py # CV results and hyperparameter analysis (needs refactoring)
│ ├── model_state_page.py # Feature importance and model state (needs refactoring)
│ └── inference_page.py # Spatial prediction visualizations (needs refactoring)
├── plots/ # Reusable plotting utilities
│ ├── colors.py # Color schemes
│ ├── hyperparameter_analysis.py
│ ├── inference.py
│ ├── model_state.py
│ ├── source_data.py
│ └── training_data.py
└── utils/ # Data loading and processing
├── data.py
└── training.py
└── utils/ # Data loading and processing utilities
├── loaders.py # Data loaders (training results, grid data, predictions)
├── stats.py # Dataset statistics computation and caching
├── colors.py # Color palette management
├── formatters.py # Display formatting utilities
└── unsembler.py # Dataset ensemble utilities
```
**Note:** Currently only `overview_page.py` has been refactored to follow the new patterns. Other pages need updating to match this structure.
## Key Technologies
- **Streamlit**: Web app framework
@ -120,6 +123,79 @@ When working with Entropice data:
3. **Training Results**: Pickled models, Parquet/NetCDF CV results
4. **Predictions**: GeoDataFrames with predicted classes/probabilities
### Dashboard Code Patterns
**Follow these patterns when developing or refactoring dashboard pages:**
1. **Modular Render Functions**: Break pages into focused render functions
```python
def render_sample_count_overview():
"""Render overview of sample counts per task+target+grid+level combination."""
# Implementation
def render_feature_count_section():
"""Render the feature count section with comparison and explorer."""
# Implementation
```
2. **Use `@st.fragment` for Interactive Components**: Isolate reactive UI elements
```python
@st.fragment
def render_feature_count_explorer():
"""Render interactive detailed configuration explorer using fragments."""
# Interactive selectboxes and checkboxes that re-run independently
```
3. **Cached Data Loading via Utilities**: Use centralized loaders from `utils/loaders.py`
```python
from entropice.dashboard.utils.loaders import load_all_training_results
from entropice.dashboard.utils.stats import load_all_default_dataset_statistics
training_results = load_all_training_results() # Cached via @st.cache_data
all_stats = load_all_default_dataset_statistics() # Cached via @st.cache_data
```
4. **Consistent Color Palettes**: Use `get_palette()` from `utils/colors.py`
```python
from entropice.dashboard.utils.colors import get_palette
task_colors = get_palette("task_types", n_colors=n_tasks)
source_colors = get_palette("data_sources", n_colors=n_sources)
```
5. **Type Hints and Type Casting**: Use types from `entropice.utils.types`
```python
from entropice.utils.types import GridConfig, L2SourceDataset, TargetDataset, grid_configs
selected_grid_config: GridConfig = next(gc for gc in grid_configs if gc.display_name == grid_level_combined)
selected_members: list[L2SourceDataset] = []
```
6. **Tab-Based Organization**: Use tabs to organize complex visualizations
```python
tab1, tab2, tab3 = st.tabs(["📈 Heatmap", "📊 Bar Chart", "📋 Data Table"])
with tab1:
# Heatmap visualization
with tab2:
# Bar chart visualization
```
7. **Layout with Columns**: Use columns for metrics and side-by-side content
```python
col1, col2, col3 = st.columns(3)
with col1:
st.metric("Total Features", f"{total_features:,}")
with col2:
st.metric("Data Sources", len(selected_members))
```
8. **Comprehensive Docstrings**: Document render functions clearly
```python
def render_training_results_summary(training_results):
"""Render summary metrics for training results."""
# Implementation
```
### Visualization Guidelines
1. **Geospatial Data**: Use PyDeck for interactive maps, Plotly for static maps
@ -127,50 +203,79 @@ When working with Entropice data:
3. **Distributions**: Use Plotly or Seaborn
4. **Feature Importance**: Use Plotly bar charts
5. **Hyperparameter Analysis**: Use Plotly scatter/parallel coordinates
6. **Heatmaps**: Use `px.imshow()` with color palettes from `get_palette()`
7. **Interactive Tables**: Use `st.dataframe()` with `width='stretch'` and formatting
### Key Utility Modules
**`utils/loaders.py`**: Data loading with Streamlit caching
- `load_all_training_results()`: Load all training result directories
- `load_training_result(path)`: Load specific training result
- `TrainingResult` dataclass: Structured training result data
**`utils/stats.py`**: Dataset statistics computation
- `load_all_default_dataset_statistics()`: Load/compute stats for all grid configs
- `DatasetStatistics` class: Statistics per grid configuration
- `MemberStatistics` class: Statistics per L2 source dataset
- `TargetStatistics` class: Statistics per target dataset
- Helper methods: `get_sample_count_df()`, `get_feature_count_df()`, `get_feature_breakdown_df()`
**`utils/colors.py`**: Consistent color palette management
- `get_palette(variable, n_colors)`: Get color palette by semantic variable name
- `get_cmap(variable)`: Get matplotlib colormap
- "Refactor training_data_page.py to match the patterns in overview_page.py"
- "Add a new tab to the overview page showing temporal statistics"
- "Create a reusable plotting function in plots/ for feature importance"
- Uses pypalettes material design palettes with deterministic mapping
**`utils/formatters.py`**: Display formatting utilities
- `ModelDisplayInfo`: Model name formatting
- `TaskDisplayInfo`: Task name formatting
- `TrainingResultDisplayInfo`: Training result display names
## Workflow
1. **Understand the Request**: Clarify what visualization or feature is needed
2. **Search for Context**: Use #tool:search to find relevant dashboard code and data structures
3. **Read Data Pipeline**: If needed, read (but don't edit) data pipeline scripts to understand data formats
4. **Consult Documentation**: Use #tool:fetch for library documentation when needed
5. **Implement Changes**: Edit dashboard files only
6. **Test Assumptions**: Check for errors with #tool:problems after edits
7. **Track Progress**: Use #tool:todos for multi-step dashboard development
1. Check `views/overview_page.py` for current patterns
2. Use #tool:search to find relevant code and data structures
3. Read data pipeline code if needed (read-only)
4. Leverage existing utilities from `utils/`
5. Use #tool:web to fetch documentation when needed
6. Implement changes following overview_page.py patterns
7. Use #tool:todo for multi-step tasks
## Example Interactions
## Refactoring Checklist
### ✅ Good Requests (Within Scope)
When updating pages to match new patterns:
- "Add a new page to visualize feature correlations"
- "Create a PyDeck map showing RTS predictions by grid cell"
- "Improve the hyperparameter analysis plot to show confidence intervals"
- "Add a Plotly histogram showing the distribution of RTS density"
- "Fix the deprecation warning about use_container_width"
1. Move to `views/` subdirectory
2. Use cached loaders from `utils/loaders.py` and `utils/stats.py`
3. Split into focused `render_*()` functions
4. Wrap interactive UI with `@st.fragment`
5. Replace hardcoded colors with `get_palette()`
6. Add type hints from `entropice.utils.types`
7. Organize with tabs for complex views
8. Use `width='stretch'` for charts/tables
9. Add comprehensive docstrings
10. Reference `overview_page.py` patterns
### ⚠️ Boundary Cases (Requires User Approval)
## Example Tasks
User: "Add a new climate variable to the dashboard"
Agent Response:
```
⚠️ This requires changes to the data pipeline (era5.py) to extract the new variable.
Please add the variable to the ERA5 processing pipeline first, then I can add it to the dashboard visualizations.
```
**✅ In Scope:**
- "Add feature correlation heatmap to overview page"
- "Create PyDeck map for RTS predictions"
- "Refactor training_data_page.py to match overview_page.py patterns"
- "Fix use_container_width deprecation warnings"
- "Add temporal statistics tab"
## Progress Reporting
**⚠️ Out of Scope:**
- "Add new climate variable" → Requires changes to `era5.py`
- "Change training metrics" → Requires changes to `training.py`
- "Modify grid generation" → Requires changes to `grids.py`
For complex dashboard development tasks:
## Key Reminders
1. Use #tool:todos to create a task list
2. Mark tasks as in-progress before starting
3. Mark completed immediately after finishing
4. Keep the user informed of progress
## Remember
- **Read-only for data pipeline**: You can read any file to understand data structures, but only edit `dashboard/` files
- **Documentation first**: When unsure about Streamlit/Plotly/PyDeck APIs, fetch documentation
- **Modern Streamlit API**: Always use `width='stretch'` instead of `use_container_width=True`
- **Pause when needed**: If data pipeline changes are required, stop and inform the user
You are here to make the dashboard better, not to change how data is created or models are trained. Stay within these boundaries and you'll be most helpful!
- Only edit files in `dashboard/`
- Use `width='stretch'` not `use_container_width=True`
- Always reference `overview_page.py` for patterns
- Use #tool:web for documentation
- Use #tool:todo for complex multi-step work

View file

@ -1,226 +1,70 @@
# Entropice - GitHub Copilot Instructions
# Entropice - Copilot Instructions
## Project Overview
## Project Context
Entropice is a geospatial machine learning system for predicting **Retrogressive Thaw Slump (RTS)** density across the Arctic using **entropy-optimal Scalable Probabilistic Approximations (eSPA)**. The system processes multi-source geospatial data, aggregates it into discrete global grids (H3/HEALPix), and trains probabilistic classifiers to estimate RTS occurrence patterns.
This is a geospatial machine learning system for predicting Arctic permafrost degradation (Retrogressive Thaw Slumps) using entropy-optimal Scalable Probabilistic Approximations (eSPA). The system processes multi-source geospatial data through discrete global grid systems (H3, HEALPix) and trains probabilistic classifiers.
For detailed architecture information, see [ARCHITECTURE.md](../ARCHITECTURE.md).
For contributing guidelines, see [CONTRIBUTING.md](../CONTRIBUTING.md).
For project goals and setup, see [README.md](../README.md).
## Code Style
## Core Technologies
- Follow PEP 8 conventions with 120 character line length
- Use type hints for all function signatures
- Use google-style docstrings for public functions
- Keep functions focused and modular
- Use `ruff` for linting and formatting, `ty` for type checking
- **Python**: 3.13 (strict version requirement)
- **Package Manager**: [Pixi](https://pixi.sh/) (not pip/conda directly)
- **GPU**: CUDA 12 with RAPIDS (CuPy, cuML)
- **Geospatial**: xarray, xdggs, GeoPandas, H3, Rasterio
- **ML**: scikit-learn, XGBoost, entropy (eSPA), cuML
- **Storage**: Zarr, Icechunk, Parquet, NetCDF
- **Visualization**: Streamlit, Bokeh, Matplotlib, Cartopy
## Technology Stack
## Code Style Guidelines
- **Core**: Python 3.13, NumPy, Pandas, Xarray, GeoPandas
- **Spatial**: H3, xdggs, xvec for discrete global grid systems
- **ML**: scikit-learn, XGBoost, cuML, entropy (eSPA)
- **GPU**: CuPy, PyTorch, CUDA 12 - prefer GPU-accelerated operations
- **Storage**: Zarr, Icechunk, Parquet for intermediate data
- **CLI**: Cyclopts with dataclass-based configurations
### Python Standards
## Execution Guidelines
- Follow **PEP 8** conventions
- Use **type hints** for all function signatures
- Write **numpy-style docstrings** for public functions
- Keep functions **focused and modular**
- Prefer descriptive variable names over abbreviations
- Always use `pixi run` to execute Python commands and scripts
- Environment variables: `SCIPY_ARRAY_API=1`, `FAST_DATA_DIR=./data`
### Geospatial Best Practices
## Geospatial Best Practices
- Use **EPSG:3413** (Arctic Stereographic) for computations
- Use **EPSG:4326** (WGS84) for visualization and library compatibility
- Store gridded data using **xarray with XDGGS** indexing
- Store tabular data as **Parquet**, array data as **Zarr**
- Leverage **Dask** for lazy evaluation of large datasets
- Use **GeoPandas** for vector operations
- Handle **antimeridian** correctly for polar regions
- Use EPSG:3413 (Arctic Stereographic) for computations
- Use EPSG:4326 (WGS84) for visualization and compatibility
- Store gridded data as XDGGS Xarray datasets (Zarr format)
- Store tabular data as GeoParquet
- Handle antimeridian issues in polar regions
- Leverage Xarray/Dask for lazy evaluation and chunked processing
### Data Pipeline Conventions
## Architecture Patterns
- Follow the numbered script sequence: `00grids.sh``01darts.sh``02alphaearth.sh``03era5.sh``04arcticdem.sh``05train.sh`
- Each pipeline stage should produce **reproducible intermediate outputs**
- Use `src/entropice/utils/paths.py` for consistent path management
- Environment variable `FAST_DATA_DIR` controls data directory location (default: `./data`)
- Modular CLI design: each module exposes standalone Cyclopts CLI
- Configuration as code: use dataclasses for typed configs, TOML for hyperparameters
- GPU acceleration: use CuPy for arrays, cuML for ML, batch processing for memory management
- Data flow: Raw sources → Grid aggregation → L2 datasets → Training → Inference → Visualization
### Storage Hierarchy
## Data Storage Hierarchy
All data follows this structure:
```
DATA_DIR/
├── grids/ # H3/HEALPix tessellations (GeoParquet)
├── darts/ # RTS labels (GeoParquet)
├── era5/ # Climate data (Zarr)
├── arcticdem/ # Terrain data (Icechunk Zarr)
├── alphaearth/ # Satellite embeddings (Zarr)
├── datasets/ # L2 XDGGS datasets (Zarr)
├── training-results/ # Models, CV results, predictions
└── watermask/ # Ocean mask (GeoParquet)
├── grids/ # H3/HEALPix tessellations (GeoParquet)
├── darts/ # RTS labels (GeoParquet)
├── era5/ # Climate data (Zarr)
├── arcticdem/ # Terrain data (Icechunk Zarr)
├── alphaearth/ # Satellite embeddings (Zarr)
├── datasets/ # L2 XDGGS datasets (Zarr)
└── training-results/ # Models, CV results, predictions
```
## Module Organization
## Key Modules
### Core Modules (`src/entropice/`)
- `entropice.spatial`: Grid generation and raster-to-vector aggregation
- `entropice.ingest`: Data extractors (DARTS, ERA5, ArcticDEM, AlphaEarth)
- `entropice.ml`: Dataset assembly, training, inference
- `entropice.dashboard`: Streamlit visualization app
- `entropice.utils`: Paths, codecs, types
The codebase is organized into four main packages:
## Testing & Notebooks
- **`entropice.ingest`**: Data ingestion from external sources
- **`entropice.spatial`**: Spatial operations and grid management
- **`entropice.ml`**: Machine learning workflows
- **`entropice.utils`**: Common utilities
#### Data Ingestion (`src/entropice/ingest/`)
- **`darts.py`**: RTS label extraction from DARTS v2 dataset
- **`era5.py`**: Climate data processing from ERA5 (Arctic-aligned years: Oct 1 - Sep 30)
- **`arcticdem.py`**: Terrain analysis from 32m Arctic elevation data
- **`alphaearth.py`**: Satellite image embeddings via Google Earth Engine
#### Spatial Operations (`src/entropice/spatial/`)
- **`grids.py`**: H3/HEALPix spatial grid generation with watermask
- **`aggregators.py`**: Raster-to-vector spatial aggregation engine
- **`watermask.py`**: Ocean masking utilities
- **`xvec.py`**: Extended vector operations for xarray
#### Machine Learning (`src/entropice/ml/`)
- **`dataset.py`**: Multi-source data integration and feature engineering
- **`training.py`**: Model training with eSPA, XGBoost, Random Forest, KNN
- **`inference.py`**: Batch prediction pipeline for trained models
#### Utilities (`src/entropice/utils/`)
- **`paths.py`**: Centralized path management
- **`codecs.py`**: Custom codecs for data serialization
### Dashboard (`src/entropice/dashboard/`)
- Streamlit-based interactive visualization
- Modular pages: overview, training data, analysis, model state, inference
- Bokeh-based geospatial plotting utilities
- Run with: `pixi run dashboard`
### Scripts (`scripts/`)
- Numbered pipeline scripts (`00grids.sh` through `05train.sh`)
- Run entire pipeline for multiple grid configurations
- Each script uses CLIs from core modules
### Notebooks (`notebooks/`)
- Exploratory analysis and validation
- **NOT committed to git**
- Keep production code in `src/entropice/`
## Development Workflow
### Setup
```bash
pixi install # NOT pip install or conda install
```
### Running Tests
```bash
pixi run pytest
```
### Running Python Commands
Always use `pixi run` to execute Python commands to use the correct environment:
```bash
pixi run python script.py
pixi run python -c "import entropice"
```
### Common Tasks
**Important**: Always use `pixi run` prefix for Python commands to ensure correct environment.
- **Generate grids**: Use `pixi run create-grid` or `spatial/grids.py` CLI
- **Process labels**: Use `pixi run darts` or `ingest/darts.py` CLI
- **Train models**: Use `pixi run train` with TOML config or `ml/training.py` CLI
- **Run inference**: Use `ml/inference.py` CLI
- **View results**: `pixi run dashboard`
## Key Design Patterns
### 1. XDGGS Indexing
All geospatial data uses discrete global grid systems (H3 or HEALPix) via `xdggs` library for consistent spatial indexing across sources.
### 2. Lazy Evaluation
Use Xarray/Dask for out-of-core computation with Zarr/Icechunk chunked storage to manage large datasets.
### 3. GPU Acceleration
Prefer GPU-accelerated operations:
- CuPy for array operations
- cuML for Random Forest and KNN
- XGBoost GPU training
- PyTorch tensors when applicable
### 4. Configuration as Code
- Use dataclasses for typed configuration
- TOML files for training hyperparameters
- Cyclopts for CLI argument parsing
## Data Sources
- **DARTS v2**: RTS labels (year, area, count, density)
- **ERA5**: Climate data (40-year history, Arctic-aligned years)
- **ArcticDEM**: 32m resolution terrain (slope, aspect, indices)
- **AlphaEarth**: 64-dimensional satellite embeddings
- **Watermask**: Ocean exclusion layer
## Model Support
Primary: **eSPA** (entropy-optimal Scalable Probabilistic Approximations)
Alternatives: XGBoost, Random Forest, K-Nearest Neighbors
Training features:
- Randomized hyperparameter search
- K-Fold cross-validation
- Multi-metric evaluation (accuracy, F1, Jaccard, precision, recall)
## Extension Points
To extend Entropice:
- **New data source**: Follow patterns in `ingest/era5.py` or `ingest/arcticdem.py`
- **Custom aggregations**: Add to `_Aggregations` dataclass in `spatial/aggregators.py`
- **Alternative labels**: Implement extractor following `ingest/darts.py` pattern
- **New models**: Add scikit-learn compatible estimators to `ml/training.py`
- **Dashboard pages**: Add Streamlit pages to `dashboard/` module
## Important Notes
- **Always use `pixi run` prefix** for Python commands (not plain `python`)
- Grid resolutions: **H3** (3-6), **HEALPix** (6-10)
- Arctic years run **October 1 to September 30** (not calendar years)
- Handle **antimeridian crossing** in polar regions
- Use **batch processing** for GPU memory management
- Notebooks are for exploration only - **keep production code in `src/`**
- Always use **absolute paths** or paths from `utils/paths.py`
## Common Issues
- **Memory**: Use batch processing and Dask chunking for large datasets
- **GPU OOM**: Reduce batch size in inference or training
- **Antimeridian**: Use proper handling in `spatial/aggregators.py` for polar grids
- **Temporal alignment**: ERA5 uses Arctic-aligned years (Oct-Sep)
- **CRS**: Compute in EPSG:3413, visualize in EPSG:4326
## References
For more details, consult:
- [ARCHITECTURE.md](../ARCHITECTURE.md) - System architecture and design patterns
- [CONTRIBUTING.md](../CONTRIBUTING.md) - Development workflow and standards
- [README.md](../README.md) - Project goals and setup instructions
- Production code belongs in `src/entropice/`, not notebooks
- Notebooks in `notebooks/` are for exploration only (not version-controlled)
- Use `pytest` for testing geospatial correctness and data integrity

View file

@ -10,7 +10,7 @@ applyTo: '**/*.py,**/*.ipynb'
- Write clear and concise comments for each function.
- Ensure functions have descriptive names and include type hints.
- Provide docstrings following PEP 257 conventions.
- Use the `typing` module for type annotations (e.g., `List[str]`, `Dict[str, int]`).
- Use the `typing` module for advanced type annotations (e.g., `TypedDict`, `Literal["a", "b", ...]`).
- Break down complex functions into smaller, more manageable functions.
## General Instructions
@ -27,7 +27,7 @@ applyTo: '**/*.py,**/*.ipynb'
- Follow the **PEP 8** style guide for Python.
- Maintain proper indentation (use 4 spaces for each level of indentation).
- Ensure lines do not exceed 79 characters.
- Ensure lines do not exceed 120 characters.
- Place function and class docstrings immediately after the `def` or `class` keyword.
- Use blank lines to separate functions, classes, and code blocks where appropriate.
@ -41,6 +41,8 @@ applyTo: '**/*.py,**/*.ipynb'
## Example of Proper Documentation
```python
import math
def calculate_area(radius: float) -> float:
"""
Calculate the area of a circle given the radius.
@ -51,6 +53,5 @@ def calculate_area(radius: float) -> float:
Returns:
float: The area of the circle, calculated as π * radius^2.
"""
import math
return math.pi * radius ** 2
```