{ "cells": [ { "cell_type": "markdown", "id": "a551ad38", "metadata": {}, "source": [ "# End-to-end dapper tutorial\n", "\n", "This notebook walks through a **minimal** end-to-end dapper workflow for three 0.5° × 0.5° grid cells. It highlights the core functionality of `dapper` but does not get into details. Other notebooks do deeper dives into elements of `dapper` functionality.\n", "\n", "`dapper` supports two run modes:\n", "\n", "- **`mode=\"sites\"`**: *one set of ELM files per geometry* (per grid cell in our case here, or `gid` in `dapper` terminology).\n", "- **`mode=\"cellset\"`**: *one set of ELM files total* containing multiple grid cells.\n", "\n", "This notebook uses `mode=\"sites\"`, but the workflow remains the same if you want to use `mode=\"cellset\"`. \n", "\n", "Important: **“sites” does _not_ mean at-a-point.** Your *input geometries can be polygons* (as we do here). \n", "When a point is needed (e.g., nearest-neighbor sampling or to label a cell), `dapper` computes and uses a **representative point** inside each polygon.\n", "\n", "We also demonstrate the difference between **point vs zonal sampling** for surface and landuse file generation:\n", "\n", "- **point/nearest**: sample the underlying global file usingthe polygon’s representative point (nearest neighbor)\n", "- **zonal**: sample the underlying global file using the area-weighted intersection of the polygon with the source grid (spatial aggregation)\n", "\n", "## What you need locally to run this notebook.\n", "\n", "Nothing that isn't already in the repo! However, there are some steps for which you may want to use your data instead of the notebook's.\n", "\n", "For MET sampling, we include **GEE sampling code** that creates the raw CSV shards, *but* you can skip it and instead place pre-sampled CSVs in `raw_gee_csvs/`. Example data has already been placed here, so you do not need GEE access to run this notebook *unless* you want to sample the ERA5 met data on your own.\n", "\n", "`dapper` cannot ship the global surface and landuse files (too big). However, a version of these is provided that has been cropped to a region that allows this notebook to run. If you have your own global surface and/or landuse files, you can point to them at the appropriate place in the notebook (or your own run script, eventually)." ] }, { "cell_type": "code", "execution_count": null, "id": "6b869c9c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DAPPER_ROOT: X:\\Research\\NGEE Arctic\\dapper\n", "OUT_ROOT: X:\\Research\\NGEE Arctic\\dapper\\docs\\tutorials\\end-to-end\\outputs\n", "GEE_MET_SHARDS: X:\\Research\\NGEE Arctic\\dapper\\docs\\data\\end-to-end\\gee_shards\n", "SURF_GLOBAL_NC: X:\\Research\\NGEE Arctic\\dapper\\docs\\data\\end-to-end\\surf_pseudoglobal.nc\n", "LANDUSE_GLOBAL_NC: X:\\Research\\NGEE Arctic\\dapper\\docs\\data\\end-to-end\\landuse_pseudoglobal.nc\n" ] } ], "source": [ "from pathlib import Path\n", "\n", "# A helper function to make sure we're pathing correctly\n", "def find_repo_root(start=None, markers=(\"pyproject.toml\", \"setup.cfg\", \".git\")) -> Path:\n", " \"\"\"Walk upward from start (default: cwd) until a repo marker is found.\"\"\"\n", " p = Path(start or Path.cwd()).resolve()\n", " for parent in (p, *p.parents):\n", " if any((parent / m).exists() for m in markers):\n", " return parent\n", " raise FileNotFoundError(\n", " \"Could not find dapper repo root. Set manually.\"\n", " )\n", "\n", "\n", "# Locate the dapper repo - change if necessary\n", "DAPPER_ROOT = find_repo_root() # this variable needs to be the dapper repo root directory; if you're not running this notebook from the notebook directory, you'll need to manually specify this\n", "\n", "# Where to store outputs - feel free to change these paths\n", "OUT_ROOT = DAPPER_ROOT / 'docs' / 'tutorials' / 'end-to-end' / 'outputs'\n", "OUT_ROOT.mkdir(parents=True, exist_ok=True)\n", "OUT_SITES = OUT_ROOT / \"sites_mode\" # just to differentiate between 'sites' and 'cellset' mode, although we won't be running 'cellset' here\n", "\n", "# Where the (pre-sampled) GEE CSV shards live.\n", "GEE_MET_SHARDS = Path(DAPPER_ROOT / \"docs\" / \"data\" / \"end-to-end\" / \"gee_shards\").resolve()\n", "GEE_MET_SHARDS.mkdir(parents=True, exist_ok=True)\n", "\n", "# In order to create surface and landuse files, we sample from global files. dapper provides these\n", "# \"pseudo-global\" files that have been cropped from true global files to cover the areas of interest\n", "# in this notebook. The actual global files are way too big for a GitHub repo. If you have access\n", "# to global files (you should), you can use those paths here instead of these.\n", "SURF_GLOBAL_NC = DAPPER_ROOT / 'docs' / 'data' / 'end-to-end' / 'surf_pseudoglobal.nc'\n", "LANDUSE_GLOBAL_NC = DAPPER_ROOT / 'docs' / 'data' / 'end-to-end' / 'landuse_pseudoglobal.nc'\n", "\n", "print(\"DAPPER_ROOT:\", DAPPER_ROOT)\n", "print(\"OUT_ROOT:\", OUT_ROOT)\n", "print(\"GEE_MET_SHARDS:\", GEE_MET_SHARDS)\n", "print(\"SURF_GLOBAL_NC:\", SURF_GLOBAL_NC)\n", "print(\"LANDUSE_GLOBAL_NC:\", LANDUSE_GLOBAL_NC)" ] }, { "cell_type": "markdown", "id": "ce95fe6c", "metadata": {}, "source": [ "## Make our geometries\n", "\n", "For this example, we're just going to create three 0.5 degree grid cells centered in the Colville River Basin. You can obviously swap your own geometries in here (shapefile, geopackage, geojson, etc.).\n", "\n", "We use shared latitude bounds:\n", "- **lat**: 69.25 → 69.75 (center 69.5)\n", "\n", "Three adjacent longitude bounds (0.5° each):\n", "- **cell_01**: [-152.25, -151.75]\n", "- **cell_02**: [-151.75, -151.25]\n", "- **cell_03**: [-151.25, -150.75]\n", "\n", "Let's plot these cells on a basemap as a quick sanity check." ] }, { "cell_type": "code", "execution_count": 4, "id": "2dbeec99", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | gid | \n", "geometry | \n", "
|---|---|---|
| 0 | \n", "cell_01 | \n", "POLYGON ((-151.75 69.25, -151.75 69.75, -152.2... | \n", "
| 1 | \n", "cell_02 | \n", "POLYGON ((-151.25 69.25, -151.25 69.75, -151.7... | \n", "
| 2 | \n", "cell_03 | \n", "POLYGON ((-150.75 69.25, -150.75 69.75, -151.2... | \n", "