01 - Using OLMT to create surface files from the global E3SM datasets

Rich Fiorella, March 11 2025

An initial starting point for simulations at a new site should start with a simulation that provides a baseline or control run, using E3SM as it is out of the box, with none of the new NGEE Arctic features turned on, and from the existing land surface boundary condition datasets.

OLMT has a tool that an create domain, surface, and landuse timeseries boundary condition files to generate these from the half-degree standard datasets used in E3SM v3.

At this stage, this notebook only outlines perhaps the simplest way to extract these datasets. There are other ways to provide some simple site-specific data when creating the surface files that are hinted at but not used here currently.

This script assumes that you are running somewhere you have access to the dcstorage drive/folder: neon_e3sm/inputdata. Throughout the remainder of this, ${INPUTDATA} assumes you are in this inputdata folder. This houses our local repository of E3SM input data. It is several terabytes, but is a subset of data available here: https://web.lcrc.anl.gov/public/e3sm/inputdata/

You will need OLMT for this - I am in the process of moving my OLMT work to GitLab so we can better control what is released outside of LANL. I recommend starting from the GitLab version since that will be getting more frequent updates:

git clone -b rfiorella/era5 git@gitlab.lanl.gov:rfiorella/OLMT

If you already have a copy of OLMT you can change the remote URL by: git remote -v

If it reads: origin https://github.com/rfiorella/OLMT.git (fetch/push)

You can update to the new url with: git remote set-url origin git@gitlab.lanl.gov:rfiorella/OLMT

Step 1: Define a site in a site group in the input data repository

If you are wanting to add a site that has not previously been defined, you need to update the text files in the ${INPUTDATA}/lnd/clm2/PTCLM folder.

inputroot='/project/neon_e3sm/inputdata'

with open(inputroot+"/lnd/clm2/PTCLM/NGEEArctic_sitedata.txt","r") as file:
    content = file.read()
    print(content)
site_code,name,state,lon,lat,elev,startyear,endyear,alignyear,timezone
AK-BEO,"Utgiagvik",AK,-156.604771,71.280008,7,2000,2015,1851,-8
AK-BEOG,"BEOGrid",AK,-156.75,71.25,7,2000,2015,1851,-8
AK-K64,"KM64",AK,-164.83355,65.162310,7,2000,2015,1851,-8
AK-K64G,"KM64Grid",AK,-164.75,65.25,7,2000,2023,1851,-8
AK-TL,"Teller",AK,-165.9530,64.73548,7,2000,2023,1851,-8
AK-TLG,"TLRGrid",AK,-165.75,64.75,7,2000,2022,1851,-8
AK-CL,"Council",AK,-163.7074,64.8493,7,2000,2015,1851,-8
AK-CLG,"CCILGrid",AK,-163.75,64.75,7,2000,2015,1851,-8
AK-UTQ,"Utqiagvik-IM1",AK,-156.5962,71.2994,5,2000,2015,1851,-8
AK-PRU,"Prudhoe-IM1",AK,-148.8189,69.8259,86,2000,2015,1851,-8
AK-ICP,"Icy Cape-IM1",AK,-160.4705,69.8605,74,2000,2015,1851,-8
AK-ANA,"Anaktuvuk-IM1",AK,-150.8717,69.4142,154,2000,2015,1851,-8
AK-BRF,"Brooks Foothills-IM1",AK,-153.9414,69.0882,299,2000,2015,1851,-8
SE-Abi,"Abisko",SE,18.78,68.35,422,2000,2024,1851,1
CA-TVC,"Trail Valley Creek",NT,-133.499,68.742,73,2000,2024,1851,-7
CA-CHA,"CHARS",NU,-105.0415,69.13,2,2000,2024,1851,-7
AK-Tlk,"Toolik Lake",AK,-149.59429,68.62758,730,2000,2024,1851,-8
CA-QHI,"Qikiqtaruk-Herschel Island",YT,-139.0762,69.5697,100,2000,2024,1851,-7
RU-Sam,"Samoylov Island",RU,126.3,72.22,24,2000,2024,1851,7
NO-SJB,"SJ-Blv Bayelva",NO,11.83109,78.92163,53,2000,2024,1851,1

You’ll see here that many of the sites we have used in Phase 3 and will be using in Phase 4 are already added to this list. PTCLM also contains a few other files for some of these sites:

with open(inputroot+"/lnd/clm2/PTCLM/NGEEArctic_soildata.txt","r") as file:
    content = file.read()
    print(content)
site_code,soil_depth,n_layers,layer_depth,layer_sand%,layer_clay%
AK-BEO,-999,1,-999,50.0,25.0
AK-BEOG,-999,1,-999,50.0,25.0
AK-K64G,-999,1,-999,50.0,25.0
AK-TLG,-999,1,-999,50.0,25.0
AK-CLG,-999,1,-999,50.0,25.0
AK-UTQ,-999,1,-999,50.0,25.0
AK-PRU,-999,1,-999,50.0,25.0
AK-ICP,-999,1,-999,50.0,25.0
AK-ANA,-999,1,-999,50.0,25.0
AK-BRF,-999,1,-999,50.0,25.0
with open(inputroot+"/lnd/clm2/PTCLM/NGEEArctic_pftdata.txt","r") as file:
    content = file.read()
    print(content)
site_code, pft_f1, pft_c1, pft_f2, pft_c2, pft_f3, pft_c3, pft_f4, pft_c4, pft_f5, pft_c5
AK-BEO, 100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0
AK-BEOG, 100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0
AK-K64G, 100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0
AK-TLG, 100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0
AK-CLG, 100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0
AK-UTQ, 100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0
AK-PRU, 100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0
AK-ICP,100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0
AK-ANA, 100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0
AK-BRF, 100.0,12, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0

If the site codes exist in all 3 files, it is possible to overwrite site soil and PFT data with what is provided in these files by setting different flags in the relevant OLMT script.

2. Run the makepointdata.py script

The makepointdata.py script in $OLMT_ROOT (that is, the directory where you have your copy of the OLMT repo) allows extraction of surface files from the gridded global datasets. An example use for the SE-Abi is below:

python makepointdata.py --res hcru_hcru --model elm --site SE-Abi --sitegroup NGEEArctic --ccsm_input /project/neon_e3sm/inputdata/ --surfdata_grid --mysimyr 1850

I normally run these from the command line, but you could also run from Python in a Jupyter notebook if you prefer:

olmt_root = '/home/rfiorella/OLMT'

import subprocess
# note: arguments and values here need to be separate entries in the list following the script name.
result = subprocess.run(["python", olmt_root+"/makepointdata.py", 
                         "--res", "hcru_hcru", 
                         "--model", "elm", 
                         "--site", "SE-Abi",
                         "--sitegroup", "NGEEArctic",
                         "--ccsm_input", "/project/neon_e3sm/inputdata/", 
                         "--surfdata_grid",
                         "--mysimyr", "1850"],
                         capture_output=True, text = True)

print("Output:", result.stdout)
print("Errors:", result.stderr)

# The errors that print about "invalid escape sequences" don't appear to be an issue, and can be ignored for now.
Output: 
Creating datasets for SE-Abi using hcru_hcru resolution
Creating domain data
INFO: Extracted and Compiled './temp/domain.nc' FROM: '/project/neon_e3sm/inputdata/share/domains/domain.clm/domain.lnd.360x720_cruncep.100429.nc'! 

Creating surface data
using PFT information from surface data
INFO: Extracted and Compiled './temp/surfdata.nc' FROM: '/project/neon_e3sm/inputdata/lnd/clm2/surfdata_map/surfdata_360x720cru_simyr1850_c180216.nc'! 

Creating dynpft data
INFO: Extracted and Compiled './temp/surfdata.pftdyn.nc' FROM: '/project/neon_e3sm/inputdata/lnd/clm2/surfdata_map/landuse.timeseries_360x720cru_hist_simyr1850-2015_c180220.nc'! 


Errors: /home/rfiorella/OLMT/makepointdata.py:69: SyntaxWarning: invalid escape sequence '\;'
  os.system('find ./temp/ -name "*.nc*" -exec rm {} \; ')
/home/rfiorella/OLMT/makepointdata.py:473: SyntaxWarning: invalid escape sequence '\;'
  os.system('find ./temp/ -name '+domainfile_tmp+' -exec rm {} \;')
/home/rfiorella/OLMT/makepointdata.py:763: SyntaxWarning: invalid escape sequence '\;'
  os.system('find ./temp/ -name "'+surffile_tmp+'" -exec rm {} \;')
/home/rfiorella/OLMT/makepointdata.py:1000: SyntaxWarning: invalid escape sequence '\;'
  os.system('find ./temp/ -name "'+pftdyn_tmp+'" -exec rm {} \;')

3. Moving the new site files to our E3SM data respository

These scripts will generate three files, if successful, in $OLMT_ROOT/temp. This would be fine if we were performing a one-off simulation of a site, but if it’s a site we’ll be returning to it’s best to move the file to the common E3SM data repository at $INPUTDATA.

The file names generated – domain.nc, surfdata.nc, surfdata.pftdyn.nc - are not particularly descriptive, so we should add information about: a) what site it is, b) resolution of the file it was generated from, and c) the creation date (maybe not necessary, but there’s a long tradition of this in the filenames for CESM/E3SM).

Continuing the example of Abisko:

domain.nc goes into $INPUTDATA/share/domains/domain.clm:
mv temp/domain.nc /project/neon_e3sm/inputdata/share/domains/domain.clm/domain.lnd.1x1pt_Abisko-GRID.nc
1x1pt indicates it is a single point, -GRID indicates it uses data from the gridded E3SM datasets (e.g., the --surfdata_grid argument in makepointdata.py)

surfdata.nc and surfdata.pftdyn.nc go into $INPUTDATA/lnd/clm2/surfdata_map/:
mv temp/surfdata.nc /project/neon_e3sm/inputdata/lnd/clm2/surfdata_map/surfdata_1x1pt_Abisko-GRID_simyr1850_c360x720_c250306.nc
mv temp/surfdata.pftdyn.nc /project/neon_e3sm/inputdata/lnd/clm2/surfdata_map/landuse.timeseries_1x1pt_Abisko-GRID_simyr1850-2015_c250306.nc
Where simyr1850 indicates it starts from a surface dataset meant to represent 1850, c360x720 is the resolution of the dataset used to create the surface files (e.g., 0.5°), and c250306 is the date I created the files.

4. Finally, please check the permissions of the files you added

The permissions model on the servers can be a bit annoying! If you create new files they may not be accessible to other members of NGEE Arctic. To ensure they are, a quick check is:

cd /project/neon_e3sm/inputdata
find . -group $USER -exec chgrp ngeearctic {} +
find . -user $USER -exec chmod g=u {} +

These commands: a) move you to the root directory of the E3SM datasets, b) find any files where the UNIX group is equal to your moinker, and changes them to ngeearctic, c) sets the ngeearctic group permissions to be the same as the user that created them.

FINALLY, because this is a shared space, please be exceedingly careful with rm in this directory!