{ "cells": [ { "cell_type": "markdown", "id": "8be352fc-1d78-4e9a-a476-c4ce9957ef7a", "metadata": {}, "source": [ "# Downloading global CMIP data\n", "This notebook demonstrates how to download global CMIP6 files based on your criteria (variables, models, experiments, etc.). This notebook only walks through the process of downloading the raw CMIP6 files, not formatting them for ELM [funcationality does not yet exist].\n", "\n", "`dapper` uses a Pangeo-hosted CMIP repository, as we found that ESGF was kinda tricky because of the transience and availability of nodes. The Pangeo archive standardizes everything into a quickly-searchable and downloadable archive, but it is not a perfect mirror of all the available data across ESGF. If you're not finding what you need here, you may have to look in ESGF. Note that Google Earth Engine also hosts a downscaled set of CMIP6 models/variables, but unfortunately it includes only a limited set of variables--not everything needed for ELM runs, so we do not provide functionality for sampling it.\n", "\n", "Searching and downloading from the Pangeo archive does not require an account, so unlike ERA5-Land data that needs a Google Earth Engine account, this should work straight out of the box." ] }, { "cell_type": "markdown", "id": "01f9cb80", "metadata": {}, "source": [ "Similar to working with ERA5-Land Hourly data, here we will specify a `params` dictionary and then send our request. Let's look at these `params` a little bit here.\n", "\n", "| Key | Definition | Examples |\n", "|--------------|------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|\n", "| `models` | Climate models (or \"sources\") that produced the simulation data, each with unique physics, resolution, and configurations. | `CESM2`, `IPSL-CM6A-LR`, `CanESM5`, `MPI-ESM1-2-HR` |\n", "| `variables` | Climate variables simulated by the models, including atmospheric, oceanic, and land-surface data. | `pr`, `tas`, `psl`, `ua` |\n", "| `experiment` | Predefined scenarios that specify forcing conditions used in climate simulations. | `historical`, `ssp245`, `ssp370`, `ssp585`, `piControl` |\n", "| `table` | Frequency and domain of the model output data. | `Amon`, `day`, `Omon`, `Lmon` |\n", "| `ensemble` | Identifier specifying realization, initialization, physics, and forcing configurations for the model run. | `r1i1p1f1`, `r2i1p1f1`, `r1i2p1f2` |\n", "\n", "You do not need to specify all of these. For example, if you're not sure which models you want, just leave it out and you'll be returned with all the models that match your other criteria. Let's try it out." ] }, { "cell_type": "code", "execution_count": null, "id": "e53bb136", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " activity_id institution_id source_id experiment_id \\\n", "0 CMIP NOAA-GFDL GFDL-CM4 historical \n", "1 CMIP NOAA-GFDL GFDL-CM4 historical \n", "2 CMIP IPSL IPSL-CM6A-LR historical \n", "3 CMIP IPSL IPSL-CM6A-LR historical \n", "4 CMIP NASA-GISS GISS-E2-1-G historical \n", ".. ... ... ... ... \n", "103 CMIP IPSL IPSL-CM6A-LR-INCA historical \n", "104 CMIP KIOST KIOST-ESM historical \n", "105 CMIP KIOST KIOST-ESM historical \n", "106 CMIP EC-Earth-Consortium EC-Earth3-Veg historical \n", "107 CMIP EC-Earth-Consortium EC-Earth3-Veg historical \n", "\n", " member_id table_id variable_id grid_label \\\n", "0 r1i1p1f1 Amon pr gr1 \n", "1 r1i1p1f1 Amon tas gr1 \n", "2 r1i1p1f1 Amon pr gr \n", "3 r1i1p1f1 Amon tas gr \n", "4 r1i1p1f1 Amon tas gn \n", ".. ... ... ... ... \n", "103 r1i1p1f1 Amon tas gr \n", "104 r1i1p1f1 Amon tas gr1 \n", "105 r1i1p1f1 Amon pr gr1 \n", "106 r1i1p1f1 Amon pr gr \n", "107 r1i1p1f1 Amon tas gr \n", "\n", " zstore dcpp_init_year \\\n", "0 gs://cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/histo... NaN \n", "1 gs://cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/histo... NaN \n", "2 gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN \n", "3 gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN \n", "4 gs://cmip6/CMIP6/CMIP/NASA-GISS/GISS-E2-1-G/hi... NaN \n", ".. ... ... \n", "103 gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR-INCA/h... NaN \n", "104 gs://cmip6/CMIP6/CMIP/KIOST/KIOST-ESM/historic... NaN \n", "105 gs://cmip6/CMIP6/CMIP/KIOST/KIOST-ESM/historic... NaN \n", "106 gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN \n", "107 gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN \n", "\n", " version \n", "0 20180701 \n", "1 20180701 \n", "2 20180803 \n", "3 20180803 \n", "4 20180827 \n", ".. ... \n", "103 20210216 \n", "104 20210601 \n", "105 20210928 \n", "106 20211207 \n", "107 20211207 \n", "\n", "[108 rows x 11 columns]\n" ] } ], "source": [ "from pathlib import Path\n", "from dapper.met import cmip_utils as cu\n", "\n", "# We will leave model selections out for now\n", "params = {\n", " \"variables\": [\"pr\", \"tas\"],\n", " \"experiment\": \"historical\",\n", " \"table\": [\"Amon\"],\n", " \"ensemble\": \"r1i1p1f1\",\n", "}\n", "\n", "available = cu.find_available_data(params)\n", "\n", "print(available)\n" ] }, { "cell_type": "markdown", "id": "c08d80a0", "metadata": {}, "source": [ "Now we see a table where each row corresponds to a dataset. Note that each variable will be on its own row, even if it comes from the same model, table, experiment, and ensemble. Let's say that you only want 10 samples of both `pr` and `tas` instead of the full catalog. We will do this by specifying the `Index` of `available.df` for the rows we want to keep. Here, we'll find 5 of these indexes.\n", "\n", "
\n", "💡 Note: You only need to do this step if you want to downselect from your returned query.\n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5df48d4e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " activity_id institution_id source_id experiment_id member_id \\\n", "54 CMIP CSIRO-ARCCSS ACCESS-CM2 historical r1i1p1f1 \n", "55 CMIP CSIRO-ARCCSS ACCESS-CM2 historical r1i1p1f1 \n", "59 CMIP CSIRO ACCESS-ESM1-5 historical r1i1p1f1 \n", "60 CMIP CSIRO ACCESS-ESM1-5 historical r1i1p1f1 \n", "78 CMIP AWI AWI-CM-1-1-MR historical r1i1p1f1 \n", "87 CMIP AWI AWI-CM-1-1-MR historical r1i1p1f1 \n", "72 CMIP AWI AWI-ESM-1-1-LR historical r1i1p1f1 \n", "73 CMIP AWI AWI-ESM-1-1-LR historical r1i1p1f1 \n", "6 CMIP BCC BCC-CSM2-MR historical r1i1p1f1 \n", "7 CMIP BCC BCC-CSM2-MR historical r1i1p1f1 \n", "\n", " table_id variable_id grid_label \\\n", "54 Amon tas gn \n", "55 Amon pr gn \n", "59 Amon pr gn \n", "60 Amon tas gn \n", "78 Amon pr gn \n", "87 Amon tas gn \n", "72 Amon tas gn \n", "73 Amon pr gn \n", "6 Amon tas gn \n", "7 Amon pr gn \n", "\n", " zstore dcpp_init_year \\\n", "54 gs://cmip6/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/... NaN \n", "55 gs://cmip6/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/... NaN \n", "59 gs://cmip6/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/hist... NaN \n", "60 gs://cmip6/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/hist... NaN \n", "78 gs://cmip6/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/histor... NaN \n", "87 gs://cmip6/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/histor... NaN \n", "72 gs://cmip6/CMIP6/CMIP/AWI/AWI-ESM-1-1-LR/histo... NaN \n", "73 gs://cmip6/CMIP6/CMIP/AWI/AWI-ESM-1-1-LR/histo... NaN \n", "6 gs://cmip6/CMIP6/CMIP/BCC/BCC-CSM2-MR/historic... NaN \n", "7 gs://cmip6/CMIP6/CMIP/BCC/BCC-CSM2-MR/historic... NaN \n", "\n", " version \n", "54 20191108 \n", "55 20191108 \n", "59 20191115 \n", "60 20191115 \n", "78 20200511 \n", "87 20200720 \n", "72 20200212 \n", "73 20200212 \n", "6 20181126 \n", "7 20181126 \n" ] } ], "source": [ "df = available.copy()\n", "\n", "# Find models that have both 'pr' and 'tas' variables\n", "grouped = df.groupby(\"source_id\")\n", "keep = []\n", "count = 0\n", "for model, g in grouped:\n", " if \"tas\" in g[\"variable_id\"].values and \"pr\" in g[\"variable_id\"].values:\n", " keep.extend(g.index.tolist())\n", " count = count + 1\n", " if count > 4:\n", " break\n", "df_export = df.iloc[keep]\n", "print(df_export) # Now we have 10 models" ] }, { "cell_type": "markdown", "id": "10e522a0", "metadata": {}, "source": [ "Now that we have the set of models we want to download, let's download them! Note that these are on the order of 100-500 MB apiece so if you're just following this example, you may want to halt early or shrink `df_export` even further." ] }, { "cell_type": "code", "execution_count": null, "id": "1521e38c", "metadata": {}, "outputs": [], "source": [ "# Choose an output folder for the downloaded NetCDF files\n", "# (relative paths are interpreted from your current working directory)\n", "CMIP_OUT = Path(r'X:\\Research\\NGEE Arctic\\CMIP output\\dapper_tutorial') # Change this or you'll have a bad time\n", "CMIP_OUT.mkdir(parents=True, exist_ok=True)\n", "\n", "cu.download_pangeo(df_export, CMIP_OUT)" ] }, { "cell_type": "markdown", "id": "26fa7f8d", "metadata": {}, "source": [ "And if we look in `CMIP_OUT`, we should see all the files." ] }, { "cell_type": "code", "execution_count": 6, "id": "f857cf32", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloaded 10 files into: X:\\Research\\NGEE Arctic\\CMIP output\\dapper_tutorial\n", " - pr_ACCESS-CM2_historical_r1i1p1f1.nc\n", " - pr_ACCESS-ESM1-5_historical_r1i1p1f1.nc\n", " - pr_AWI-CM-1-1-MR_historical_r1i1p1f1.nc\n", " - pr_AWI-ESM-1-1-LR_historical_r1i1p1f1.nc\n", " - pr_BCC-CSM2-MR_historical_r1i1p1f1.nc\n", " - tas_ACCESS-CM2_historical_r1i1p1f1.nc\n", " - tas_ACCESS-ESM1-5_historical_r1i1p1f1.nc\n", " - tas_AWI-CM-1-1-MR_historical_r1i1p1f1.nc\n", " - tas_AWI-ESM-1-1-LR_historical_r1i1p1f1.nc\n", " - tas_BCC-CSM2-MR_historical_r1i1p1f1.nc\n" ] } ], "source": [ "# Quick sanity check: show a few of the downloaded files\n", "if not CMIP_OUT.exists():\n", " raise FileNotFoundError(\n", " f\"Output directory not found: {CMIP_OUT.resolve()}\\n\"\n", " \"Run the download cell above first, or update `path_out` to your chosen location.\"\n", " )\n", "\n", "files = sorted([p for p in CMIP_OUT.rglob(\"*\") if p.is_file()])\n", "\n", "print(f\"Downloaded {len(files)} files into: {CMIP_OUT.resolve()}\")\n", "for p in files[:20]:\n", " print(\" -\", p.relative_to(CMIP_OUT))\n" ] }, { "cell_type": "markdown", "id": "a09ca756", "metadata": {}, "source": [ "## What next?\n", "`dapper` does not yet have an `Adapter()` for CMIP data, but it's coming! You can use these \"raw\" downloads for analysis until then." ] }, { "cell_type": "markdown", "id": "7f05f9d0", "metadata": {}, "source": [] } ], "metadata": { "jupytext": { "formats": "ipynb,py:percent" }, "kernelspec": { "display_name": "dapper", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.9" } }, "nbformat": 4, "nbformat_minor": 5 }