dapper.surf package¶
Submodules¶
dapper.surf.sample module¶
dapper module: surf.sample.
- class dapper.surf.sample.SurfacePointSampler(nc_in, *, decode_times=True, chunks=None, include=None, exclude=None)[source]¶
Bases:
objectEfficient point sampler for an ELM surface dataset: opens NetCDF once, samples many points.
- Parameters:
nc_in (str | Path)
decode_times (bool)
chunks (Optional[Dict[str, int]])
include (Optional[set[str]])
exclude (Optional[set[str]])
- dapper.surf.sample.sample_point_values(nc_in, lat, lon, *, decode_times=True, chunks=None, include=None, exclude=None)[source]¶
Backwards-friendly convenience wrapper: opens, samples one point, closes.
- Return type:
Dict[str,Any]- Parameters:
nc_in (str | Path)
lat (float)
lon (float)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
dapper.surf.schema module¶
dapper module: surf.schema.
- class dapper.surf.schema.ParDef(dims, dtype='float32', units='', doc='', required_level='', attrs=None, contexts=())[source]¶
Bases:
objectSchema record for one surface parameter.
- dims:
Tuple of dimension names in model order (non-spatial first, then spatial, e.g. (“natpft”, “lsmlat”, “lsmlon”)).
- dtype:
NetCDF dtype as a string (“float32”, “int16”, etc.).
- units:
Unit string; empty means “not enforced”.
- doc:
Human-readable description, suitable for docs.
- required_level:
Semantic requirement flag, e.g. “required”, “optional”, “recommended”, “conditional”. The validator treats “required” as hard-required.
- attrs:
Extra NetCDF attributes (long_name, standard_name, etc.).
- Parameters:
dims (Tuple[str, ...])
dtype (str)
units (str)
doc (str)
required_level (str)
attrs (Dict[str, Any] | None)
contexts (Tuple[str, ...])
-
attrs:
Optional[Dict[str,Any]] = None¶
-
contexts:
Tuple[str,...] = ()¶
-
dims:
Tuple[str,...]¶
-
doc:
str= ''¶
-
dtype:
str= 'float32'¶
-
required_level:
str= ''¶
-
units:
str= ''¶
- dapper.surf.schema.expand_registry(as_json=False)[source]¶
Return the full registry as plain dict (easy to dump/serialize).
- Return type:
Dict[str,Dict]- Parameters:
as_json (bool)
- dapper.surf.schema.pdef(dims, dtype='float32', units='', doc='', required_level='', **attrs)[source]¶
Convenience constructor for ParDef.
dims can be a comma-separated string (“lsmlat,lsmlon”) or an iterable of dim names. Any extra keyword args become NetCDF variable attributes (e.g., long_name=”…”).
- Return type:
- Parameters:
dtype (str)
units (str)
doc (str)
required_level (str)
- dapper.surf.schema.propose_export_policy(var, sizes, ParDef=None)[source]¶
Return a compact policy dict for a variable, based on its dims and overrides.
- Parameters:
var (str)
sizes (Dict[str, int])
ParDef (ParDef | None)
- dapper.surf.schema.register_many(names, v)[source]¶
Register many variables with the same ParDef in one call.
- dapper.surf.schema.validate_against_schema(present_vars)[source]¶
Validate a set of variable names against SCHEMA rules.
Per-variable requirement is taken from ParDef.required_level (currently only ‘required’ is treated as hard-required).
‘choose_one_of’ groups are enforced at the tier level.
‘conditional’ rules are enforced as warnings when violated.
- Return type:
Dict[str,List[str]]- Parameters:
present_vars (Iterable[str])
dapper.surf.sfile module¶
Surface file construction, customization, and validation helpers.
- exception dapper.surf.sfile.CustomizeError[source]¶
Bases:
ValueErrorRaised when a customization fails schema/formatting validation.
- class dapper.surf.sfile.SurfaceFile(ds, registry=None)[source]¶
Bases:
objectUnified interface for building and editing ELM/ELM-style surface files.
Wraps an in-memory xarray.Dataset (
self.ds).Knows about the surface-variable registry (
dapper.surf.schema;SC.REGISTRY).Can be constructed from:
an existing NetCDF path (
from_netcdf)a point sampled from the global half-degree surface (
from_halfdegree_point)a Domain (
from_domain); currently a light stub you can extend
Parameters are added via
add_params_from_df. That method:creates the named dimension if it does not exist yet, using the distinct values of
id_colfrom the DataFrameadds/overwrites 1D variables whose names come directly from DataFrame column names (except
id_colanddrop_cols)
- Parameters:
ds (Dataset)
registry (Dict[str, ParDef] | None)
- add_params_from_df(dim_name, df, id_col, *, drop_cols=None)[source]¶
Attach / update 1D parameters along dim_name using a DataFrame.
- Parameters:
dim_name (str) – Logical dimension name (e.g. “topounit”, “pft”).
df (pandas.DataFrame or geopandas.GeoDataFrame) – Must contain id_col and one column per parameter.
id_col (str) – Column containing the IDs. The distinct values (as strings), in order of appearance, are used as coordinates if dim_name does not already exist.
drop_cols (list[str], optional) – Columns to ignore as parameters (e.g. “geometry”).
- Return type:
None
- add_topounits_from_domain(domain, *, gid_col='gid', id_col='topounit_id', pct_col='TopounitPctOfCell', dim_name='topounit', pct_var_name='PCT_TOPUNIT')[source]¶
Attach topounits + per-cell weights to the surface dataset.
Expects domain.topounits to exist and contain: - gid_col (links topounit -> cell gid) - id_col (unique id per topounit across the whole run) - pct_col (percent of the parent cell; sums to ~100 per gid)
- Return type:
None- Parameters:
gid_col (str)
id_col (str)
pct_col (str)
dim_name (str)
pct_var_name (str)
- basic_registry_check()[source]¶
Quick registry sanity check.
- Return type:
Dict[str,set[str]]- Returns:
{ – “known”: set of vars present in REGISTRY, “unknown”: set of vars NOT present in REGISTRY,
}
- drop_params(names)[source]¶
Drop one or more data variables from the surface dataset.
- Return type:
None- Parameters:
names (str | List[str])
- classmethod export(domain, *, out_dir, src_path, filename='surfdata.nc', overwrite=False, append_attrs=None, decode_times=True, chunks=None, include=None, exclude=None, registry=None, attach_topounits=True, sampling_method='nearest', lon_wrap='auto', agg_policy=None, validate=False, validator_kwargs=None)[source]¶
Export surface file(s) for a Domain.
Returns: dict[run_id, path]
domain.mode=’cellset’: one file in out_dir
domain.mode=’sites’ : one file per site in out_dir/<gid>/
- Return type:
Dict[str,Path]- Parameters:
domain (Any)
out_dir (str | Path)
src_path (str | Path)
filename (str)
overwrite (bool)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
registry (Dict[str, ParDef] | None)
attach_topounits (bool)
sampling_method (Literal['nearest', 'zonal'])
lon_wrap (Literal['auto', '0_360', '-180_180'])
agg_policy (dict[str, str] | None)
validate (bool)
validator_kwargs (Dict[str, Any] | None)
- classmethod from_domain(domain, src_path, *, decode_times=True, chunks=None, include=None, exclude=None, registry=None, attach_topounits=True, sampling_method='nearest', lon_wrap='auto', agg_policy=None)[source]¶
Sample a global surface Dataset for a single-run Domain and return a SurfaceFile.
- Return type:
- Parameters:
domain (Any)
src_path (str | Path)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
registry (Dict[str, ParDef] | None)
attach_topounits (bool)
sampling_method (Literal['nearest', 'zonal'])
lon_wrap (Literal['auto', '0_360', '-180_180'])
agg_policy (dict[str, str] | None)
- classmethod from_halfdegree_point(lat, lon, *, src_path, decode_times=True, chunks=None, include=None, exclude=None, registry=None)[source]¶
Sample the global half-degree surface at (lat, lon) and return a 1x1 SurfaceFile. Uses dapper.geo.sampling.sample_gridded_dataset_points.
- Return type:
- Parameters:
lat (float)
lon (float)
src_path (str | Path)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
registry (Dict[str, ParDef] | None)
- classmethod from_netcdf(path, registry=None, decode_times=True)[source]¶
Workflow A: wrap an existing surface file for editing.
- Return type:
- Parameters:
path (str | Path)
registry (Dict[str, ParDef] | None)
decode_times (bool)
- resize_dim(dim_name, new_size, *, fill_value=nan)[source]¶
Generic “change dimensionality” helper (e.g. nlevsoi 10 → 15).
If new_size < old_size: truncate all vars using that dim.
If new_size > old_size: pad with fill_value.
This is intentionally generic; you can wrap ELM-specific logic (e.g. updating ‘nlevsoi’ scalar) on top of it.
- Return type:
None- Parameters:
dim_name (str)
new_size (int)
fill_value (float)
- set_global_attrs(**attrs)[source]¶
Update global attributes on the underlying Dataset.
- Return type:
None- Parameters:
attrs (Any)
- set_scalar(name, value)[source]¶
Convenience for setting scalar parameters like nlevsoi, numrad, etc.
- Return type:
None- Parameters:
name (str)
value (ndarray | DataArray | float | int)
- to_netcdf(path, overwrite=False, encoding=None, append_attrs=None, dapper_attrs=None, add_created_utc=True)[source]¶
Write this SurfaceFile to disk as NetCDF.
- Return type:
str- Parameters:
path (str | Path)
overwrite (bool)
encoding (Dict[str, Dict[str, Any]] | None)
append_attrs (dict | None)
dapper_attrs (dict | None)
add_created_utc (bool)
- validate(strict=False, use_external_validator=False, validator_kwargs=None)[source]¶
Validate the surface Dataset.
- strict=False, use_external_validator=False:
only run basic_registry_check; print a warning for unknown vars.
- use_external_validator=True:
write to a temporary file and run SurfaceValidator on it; return the pandas.DataFrame report.
- Parameters:
strict (bool)
use_external_validator (bool)
validator_kwargs (Dict[str, Any] | None)
- dapper.surf.sfile.build_surface_dataset(sampled, *, include=None, drop_non_spatial_arrays=False)[source]¶
Turn a sampled dict (from sample_point_values) into a 1x1 ELM surface xarray.Dataset. Adds spatial dims back as length-1 and preserves other dims in file order.
- Return type:
Dataset- Parameters:
sampled (Dict[str, Any])
include (set[str] | None)
drop_non_spatial_arrays (bool)
- dapper.surf.sfile.build_surface_dataset_cellset(sampled_list, *, include=None, drop_non_spatial_arrays=False)[source]¶
Build an ELM surface xarray.Dataset for a cellset laid out as (nj=N, ni=1). This mirrors your domain writer default of N×1, and keeps spatial dims last.
Each entry of sampled_list is the dict returned by SurfacePointSampler.sample().
- Return type:
Dataset- Parameters:
sampled_list (List[Dict[str, Any]])
include (set[str] | None)
drop_non_spatial_arrays (bool)
- dapper.surf.sfile.customize_surface(src_path, customizations, nc_out=None, *, strict_registry=True, allow_add=True, run_validation=False, validator_kwargs=None, units_policy='enforce', engine='netcdf4')[source]¶
Update or add parameters in an existing ELM surface NetCDF (path-only API).
- Parameters:
src_path (str | Path) – Path to existing surface NetCDF.
customizations (dict) –
- Mapping of variable -> value OR variable -> spec dict:
value can be: scalar, np.ndarray, xr.DataArray (broadcasted)
- spec dict keys:
- {“value”: <required>,
”dims”: [“optional dim names for 1D arrays”], “dtype”: “optional dtype override (e.g., ‘float32’)”, “units”: “optional units override (if not enforced by registry)”}
- Notes:
For existing variables, dims are taken from the file and ‘dims’ is ignored (value must be broadcastable to that shape).
- For NEW variables (not in file):
If present in REGISTRY, dims/dtype/units come from REGISTRY. All of those dims must already exist in the dataset (sizes are reused).
If NOT in REGISTRY and strict_registry=True -> error.
If NOT in REGISTRY and strict_registry=False -> you must pass a spec dict with ‘dims’, ‘dtype’, and ‘units’.
nc_out (str | Path, optional) – Output path; default is ‘<stem>_custom.nc’ next to input.
strict_registry (bool) – Require variables to exist in schema.REGISTRY. True recommended.
validate_units (bool) – Ensure file units match registry units (registry ‘’/’varies’ are skipped).
allow_add (bool) – Permit adding new variables; otherwise only overwrite existing ones.
run_validation (bool) – If True, run dapper.surf.validate.SurfaceValidator on the written file and return the report.
validator_kwargs (dict) – Passed to SurfaceValidator(…).
units_policy (str)
engine (str)
- Return type:
(out_path, report_df_or_None)
- Raises:
CustomizeError on shape/dtype/units/dim mismatches. –
- dapper.surf.sfile.write_surface_nc(ds, out_path, *, append_attrs=None, dapper_attrs=None, add_created_utc=True)[source]¶
Write a surface Dataset to NetCDF with ELM-friendly defaults and merged attributes.
- Return type:
str- Parameters:
ds (Dataset)
out_path (str)
append_attrs (dict | None)
dapper_attrs (dict | None)
add_created_utc (bool)
dapper.surf.surface_var_specs module¶
Canonical ELM/ELM surface-variable spec for Dapper.
This file is the single place where new surface variables and their basic metadata (dims, description, requiredness) are added. Other modules (schema, validation, docs) should import this and not duplicate this information elsewhere.
It was initially generated from a ChatGPT scrape of a public ELM surface file and E3SM Fortran code, assembled by Rich Fiorella. Jon then added more parameters based on comparing a handful of ELM surface files. Additional parameters should be added to the SURFACE_VAR_SPECS dictionary below.
dapper.surf.validate module¶
dapper module: surf.validate.
- class dapper.surf.validate.CheckResult(check, severity, passed, detail, var=None)[source]¶
Bases:
objectOne validation result row.
- Parameters:
check (str)
severity (str)
passed (bool)
detail (str)
var (str | None)
-
check:
str¶
-
detail:
str¶
-
passed:
bool¶
-
severity:
str¶
-
var:
Optional[str] = None¶
- class dapper.surf.validate.SurfaceValidator(*, expected_sizes=None, lat_candidates=('lsmlat', 'lat', 'latitude', 'y'), lon_candidates=('lsmlon', 'lon', 'longitude', 'x'), enforce_known_vars_only=False, require_point_dims=True, skip_soft_checks=False)[source]¶
Bases:
objectValidator for ELM/CLM point surface NetCDF files (1×1 spatial cell).
Primary (format/layout) checks¶
V-001 dims: lat/lon like dims exist and both have length == 1 (ERROR) V-002 dims.sizes: expected sizes for common dims (WARN) e.g. time=12, natpft=17, lsmpft=17, nlevsoi=10, nlevslp=11, numurbl=3, numrad=2, nlevurb=5 V-003 schema.required: required variables per SCHEMA are present (ERROR) V-004 schema.choose_one_of: at least one var present in each group (ERROR) V-005 schema.conditional: if driver present → dependent vars present (WARN) V-006 registry.known_vars: vars not in REGISTRY flagged (INFO, or ERROR if enforce_known_vars_only) V-007 dims.order: for spatial vars, spatial dims are the last two (…, lat, lon) (WARN) V-008 dims.match_registry: non-spatial dim order matches REGISTRY (WARN) V-009 dtype.match_registry: integer vs float matches REGISTRY (WARN) V-010 units.present: ‘units’ attribute exists (WARN) V-011 units.match_registry: exact match when REGISTRY.units not ‘’/’varies’ (WARN) V-012 fillvalue.sane: floats have _FillValue (NaN ok), ints do not rely on NaN (WARN) V-013 coordinates.present: LATIXY/LONGXY present (WARN); optional INFO: lat/lon coord ≈ LATIXY/LONGXY
Soft (non-blocking) checks¶
V-101 ranges.percent: PCT_* (and PCT_NATVEG) ∈ [0,100] (ERROR) V-102 ranges.unit: LANDFRAC_PFT, SKY_VIEW ∈ [0,1] (ERROR) V-103 ranges.nonneg: SLOPE, (ST)DEV_ELEV, AREA, TOPO ≥ 0 (ERROR) V-104 time.length: any var with ‘time’ dim → len(time)==12 (ERROR) V-105 consistency.pftsum: sum(PCT_NAT_PFT) ≈ PCT_NATVEG (WARN) V-106 conditional.urban: if max(PCT_URBAN)>0 → URBAN_REGION_ID present (WARN) V-107 conditional.glacier: if max(PCT_GLACIER)>0 → GLC_MEC & PCT_GLC_MEC present (WARN)
Usage¶
>>> v = SurfaceValidator() >>> report = v.validate(r"X:\path\surfdata_1x1pt.nc") >>> report.query("severity=='ERROR' and passed==False")
- Parameters:
expected_sizes (Dict[str, int] | None)
lat_candidates (Tuple[str, ...])
lon_candidates (Tuple[str, ...])
enforce_known_vars_only (bool)
require_point_dims (bool)
skip_soft_checks (bool)
Module contents¶
dapper module: surf.__init__.
- class dapper.surf.SurfaceFile(ds, registry=None)[source]¶
Bases:
objectUnified interface for building and editing ELM/ELM-style surface files.
Wraps an in-memory xarray.Dataset (
self.ds).Knows about the surface-variable registry (
dapper.surf.schema;SC.REGISTRY).Can be constructed from:
an existing NetCDF path (
from_netcdf)a point sampled from the global half-degree surface (
from_halfdegree_point)a Domain (
from_domain); currently a light stub you can extend
Parameters are added via
add_params_from_df. That method:creates the named dimension if it does not exist yet, using the distinct values of
id_colfrom the DataFrameadds/overwrites 1D variables whose names come directly from DataFrame column names (except
id_colanddrop_cols)
- Parameters:
ds (Dataset)
registry (Dict[str, ParDef] | None)
- add_params_from_df(dim_name, df, id_col, *, drop_cols=None)[source]¶
Attach / update 1D parameters along dim_name using a DataFrame.
- Parameters:
dim_name (str) – Logical dimension name (e.g. “topounit”, “pft”).
df (pandas.DataFrame or geopandas.GeoDataFrame) – Must contain id_col and one column per parameter.
id_col (str) – Column containing the IDs. The distinct values (as strings), in order of appearance, are used as coordinates if dim_name does not already exist.
drop_cols (list[str], optional) – Columns to ignore as parameters (e.g. “geometry”).
- Return type:
None
- add_topounits_from_domain(domain, *, gid_col='gid', id_col='topounit_id', pct_col='TopounitPctOfCell', dim_name='topounit', pct_var_name='PCT_TOPUNIT')[source]¶
Attach topounits + per-cell weights to the surface dataset.
Expects domain.topounits to exist and contain: - gid_col (links topounit -> cell gid) - id_col (unique id per topounit across the whole run) - pct_col (percent of the parent cell; sums to ~100 per gid)
- Return type:
None- Parameters:
gid_col (str)
id_col (str)
pct_col (str)
dim_name (str)
pct_var_name (str)
- basic_registry_check()[source]¶
Quick registry sanity check.
- Return type:
Dict[str,set[str]]- Returns:
{ – “known”: set of vars present in REGISTRY, “unknown”: set of vars NOT present in REGISTRY,
}
- drop_params(names)[source]¶
Drop one or more data variables from the surface dataset.
- Return type:
None- Parameters:
names (str | List[str])
- classmethod export(domain, *, out_dir, src_path, filename='surfdata.nc', overwrite=False, append_attrs=None, decode_times=True, chunks=None, include=None, exclude=None, registry=None, attach_topounits=True, sampling_method='nearest', lon_wrap='auto', agg_policy=None, validate=False, validator_kwargs=None)[source]¶
Export surface file(s) for a Domain.
Returns: dict[run_id, path]
domain.mode=’cellset’: one file in out_dir
domain.mode=’sites’ : one file per site in out_dir/<gid>/
- Return type:
Dict[str,Path]- Parameters:
domain (Any)
out_dir (str | Path)
src_path (str | Path)
filename (str)
overwrite (bool)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
registry (Dict[str, ParDef] | None)
attach_topounits (bool)
sampling_method (Literal['nearest', 'zonal'])
lon_wrap (Literal['auto', '0_360', '-180_180'])
agg_policy (dict[str, str] | None)
validate (bool)
validator_kwargs (Dict[str, Any] | None)
- classmethod from_domain(domain, src_path, *, decode_times=True, chunks=None, include=None, exclude=None, registry=None, attach_topounits=True, sampling_method='nearest', lon_wrap='auto', agg_policy=None)[source]¶
Sample a global surface Dataset for a single-run Domain and return a SurfaceFile.
- Return type:
- Parameters:
domain (Any)
src_path (str | Path)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
registry (Dict[str, ParDef] | None)
attach_topounits (bool)
sampling_method (Literal['nearest', 'zonal'])
lon_wrap (Literal['auto', '0_360', '-180_180'])
agg_policy (dict[str, str] | None)
- classmethod from_halfdegree_point(lat, lon, *, src_path, decode_times=True, chunks=None, include=None, exclude=None, registry=None)[source]¶
Sample the global half-degree surface at (lat, lon) and return a 1x1 SurfaceFile. Uses dapper.geo.sampling.sample_gridded_dataset_points.
- Return type:
- Parameters:
lat (float)
lon (float)
src_path (str | Path)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
registry (Dict[str, ParDef] | None)
- classmethod from_netcdf(path, registry=None, decode_times=True)[source]¶
Workflow A: wrap an existing surface file for editing.
- Return type:
- Parameters:
path (str | Path)
registry (Dict[str, ParDef] | None)
decode_times (bool)
- resize_dim(dim_name, new_size, *, fill_value=nan)[source]¶
Generic “change dimensionality” helper (e.g. nlevsoi 10 → 15).
If new_size < old_size: truncate all vars using that dim.
If new_size > old_size: pad with fill_value.
This is intentionally generic; you can wrap ELM-specific logic (e.g. updating ‘nlevsoi’ scalar) on top of it.
- Return type:
None- Parameters:
dim_name (str)
new_size (int)
fill_value (float)
- set_global_attrs(**attrs)[source]¶
Update global attributes on the underlying Dataset.
- Return type:
None- Parameters:
attrs (Any)
- set_scalar(name, value)[source]¶
Convenience for setting scalar parameters like nlevsoi, numrad, etc.
- Return type:
None- Parameters:
name (str)
value (ndarray | DataArray | float | int)
- to_netcdf(path, overwrite=False, encoding=None, append_attrs=None, dapper_attrs=None, add_created_utc=True)[source]¶
Write this SurfaceFile to disk as NetCDF.
- Return type:
str- Parameters:
path (str | Path)
overwrite (bool)
encoding (Dict[str, Dict[str, Any]] | None)
append_attrs (dict | None)
dapper_attrs (dict | None)
add_created_utc (bool)
- validate(strict=False, use_external_validator=False, validator_kwargs=None)[source]¶
Validate the surface Dataset.
- strict=False, use_external_validator=False:
only run basic_registry_check; print a warning for unknown vars.
- use_external_validator=True:
write to a temporary file and run SurfaceValidator on it; return the pandas.DataFrame report.
- Parameters:
strict (bool)
use_external_validator (bool)
validator_kwargs (Dict[str, Any] | None)