dapper.surf.sfile

Surface file construction, customization, and validation helpers.

Functions

build_surface_dataset(sampled, *[, include, ...])

Turn a sampled dict (from sample_point_values) into a 1x1 ELM surface xarray.Dataset.

build_surface_dataset_cellset(sampled_list, *)

Build an ELM surface xarray.Dataset for a cellset laid out as (nj=N, ni=1).

customize_surface(src_path, customizations)

Update or add parameters in an existing ELM surface NetCDF (path-only API).

write_surface_nc(ds, out_path, *[, ...])

Write a surface Dataset to NetCDF with ELM-friendly defaults and merged attributes.

Classes

SurfaceFile(ds[, registry])

Unified interface for building and editing ELM/ELM-style surface files.

Exceptions

CustomizeError

Raised when a customization fails schema/formatting validation.

exception dapper.surf.sfile.CustomizeError[source]

Bases: ValueError

Raised when a customization fails schema/formatting validation.

class dapper.surf.sfile.SurfaceFile(ds, registry=None)[source]

Bases: object

Unified interface for building and editing ELM/ELM-style surface files.

  • Wraps an in-memory xarray.Dataset (self.ds).

  • Knows about the surface-variable registry (dapper.surf.schema; SC.REGISTRY).

  • Can be constructed from:

    • an existing NetCDF path (from_netcdf)

    • a point sampled from the global half-degree surface (from_halfdegree_point)

    • a Domain (from_domain); currently a light stub you can extend

Parameters are added via add_params_from_df. That method:

  • creates the named dimension if it does not exist yet, using the distinct values of id_col from the DataFrame

  • adds/overwrites 1D variables whose names come directly from DataFrame column names (except id_col and drop_cols)

Parameters:
  • ds (Dataset)

  • registry (Dict[str, ParDef] | None)

add_params_from_df(dim_name, df, id_col, *, drop_cols=None)[source]

Attach / update 1D parameters along dim_name using a DataFrame.

Parameters:
  • dim_name (str) – Logical dimension name (e.g. “topounit”, “pft”).

  • df (pandas.DataFrame or geopandas.GeoDataFrame) – Must contain id_col and one column per parameter.

  • id_col (str) – Column containing the IDs. The distinct values (as strings), in order of appearance, are used as coordinates if dim_name does not already exist.

  • drop_cols (list[str], optional) – Columns to ignore as parameters (e.g. “geometry”).

Return type:

None

add_topounits_from_domain(domain, *, gid_col='gid', id_col='topounit_id', pct_col='TopounitPctOfCell', dim_name='topounit', pct_var_name='PCT_TOPUNIT')[source]

Attach topounits + per-cell weights to the surface dataset.

Expects domain.topounits to exist and contain: - gid_col (links topounit -> cell gid) - id_col (unique id per topounit across the whole run) - pct_col (percent of the parent cell; sums to ~100 per gid)

Return type:

None

Parameters:
  • gid_col (str)

  • id_col (str)

  • pct_col (str)

  • dim_name (str)

  • pct_var_name (str)

basic_registry_check()[source]

Quick registry sanity check.

Return type:

Dict[str, set[str]]

Returns:

  • { – “known”: set of vars present in REGISTRY, “unknown”: set of vars NOT present in REGISTRY,

  • }

drop_params(names)[source]

Drop one or more data variables from the surface dataset.

Return type:

None

Parameters:

names (str | List[str])

classmethod export(domain, *, out_dir, src_path, filename='surfdata.nc', overwrite=False, append_attrs=None, decode_times=True, chunks=None, include=None, exclude=None, registry=None, attach_topounits=True, sampling_method='nearest', lon_wrap='auto', agg_policy=None, validate=False, validator_kwargs=None)[source]

Export surface file(s) for a Domain.

Returns: dict[run_id, path]

  • domain.mode=’cellset’: one file in out_dir

  • domain.mode=’sites’ : one file per site in out_dir/<gid>/

Return type:

Dict[str, Path]

Parameters:
  • domain (Any)

  • out_dir (str | Path)

  • src_path (str | Path)

  • filename (str)

  • overwrite (bool)

  • decode_times (bool)

  • chunks (Dict[str, int] | None)

  • include (set[str] | None)

  • exclude (set[str] | None)

  • registry (Dict[str, ParDef] | None)

  • attach_topounits (bool)

  • sampling_method (Literal['nearest', 'zonal'])

  • lon_wrap (Literal['auto', '0_360', '-180_180'])

  • agg_policy (dict[str, str] | None)

  • validate (bool)

  • validator_kwargs (Dict[str, Any] | None)

classmethod from_domain(domain, src_path, *, decode_times=True, chunks=None, include=None, exclude=None, registry=None, attach_topounits=True, sampling_method='nearest', lon_wrap='auto', agg_policy=None)[source]

Sample a global surface Dataset for a single-run Domain and return a SurfaceFile.

Return type:

SurfaceFile

Parameters:
  • domain (Any)

  • src_path (str | Path)

  • decode_times (bool)

  • chunks (Dict[str, int] | None)

  • include (set[str] | None)

  • exclude (set[str] | None)

  • registry (Dict[str, ParDef] | None)

  • attach_topounits (bool)

  • sampling_method (Literal['nearest', 'zonal'])

  • lon_wrap (Literal['auto', '0_360', '-180_180'])

  • agg_policy (dict[str, str] | None)

classmethod from_halfdegree_point(lat, lon, *, src_path, decode_times=True, chunks=None, include=None, exclude=None, registry=None)[source]

Sample the global half-degree surface at (lat, lon) and return a 1x1 SurfaceFile. Uses dapper.geo.sampling.sample_gridded_dataset_points.

Return type:

SurfaceFile

Parameters:
  • lat (float)

  • lon (float)

  • src_path (str | Path)

  • decode_times (bool)

  • chunks (Dict[str, int] | None)

  • include (set[str] | None)

  • exclude (set[str] | None)

  • registry (Dict[str, ParDef] | None)

classmethod from_netcdf(path, registry=None, decode_times=True)[source]

Workflow A: wrap an existing surface file for editing.

Return type:

SurfaceFile

Parameters:
  • path (str | Path)

  • registry (Dict[str, ParDef] | None)

  • decode_times (bool)

resize_dim(dim_name, new_size, *, fill_value=nan)[source]

Generic “change dimensionality” helper (e.g. nlevsoi 10 → 15).

  • If new_size < old_size: truncate all vars using that dim.

  • If new_size > old_size: pad with fill_value.

This is intentionally generic; you can wrap ELM-specific logic (e.g. updating ‘nlevsoi’ scalar) on top of it.

Return type:

None

Parameters:
  • dim_name (str)

  • new_size (int)

  • fill_value (float)

set_global_attrs(**attrs)[source]

Update global attributes on the underlying Dataset.

Return type:

None

Parameters:

attrs (Any)

set_scalar(name, value)[source]

Convenience for setting scalar parameters like nlevsoi, numrad, etc.

Return type:

None

Parameters:
  • name (str)

  • value (ndarray | DataArray | float | int)

to_netcdf(path, overwrite=False, encoding=None, append_attrs=None, dapper_attrs=None, add_created_utc=True)[source]

Write this SurfaceFile to disk as NetCDF.

Return type:

str

Parameters:
  • path (str | Path)

  • overwrite (bool)

  • encoding (Dict[str, Dict[str, Any]] | None)

  • append_attrs (dict | None)

  • dapper_attrs (dict | None)

  • add_created_utc (bool)

validate(strict=False, use_external_validator=False, validator_kwargs=None)[source]

Validate the surface Dataset.

strict=False, use_external_validator=False:
  • only run basic_registry_check; print a warning for unknown vars.

use_external_validator=True:
  • write to a temporary file and run SurfaceValidator on it; return the pandas.DataFrame report.

Parameters:
  • strict (bool)

  • use_external_validator (bool)

  • validator_kwargs (Dict[str, Any] | None)

dapper.surf.sfile.build_surface_dataset(sampled, *, include=None, drop_non_spatial_arrays=False)[source]

Turn a sampled dict (from sample_point_values) into a 1x1 ELM surface xarray.Dataset. Adds spatial dims back as length-1 and preserves other dims in file order.

Return type:

Dataset

Parameters:
  • sampled (Dict[str, Any])

  • include (set[str] | None)

  • drop_non_spatial_arrays (bool)

dapper.surf.sfile.build_surface_dataset_cellset(sampled_list, *, include=None, drop_non_spatial_arrays=False)[source]

Build an ELM surface xarray.Dataset for a cellset laid out as (nj=N, ni=1). This mirrors your domain writer default of N×1, and keeps spatial dims last.

Each entry of sampled_list is the dict returned by SurfacePointSampler.sample().

Return type:

Dataset

Parameters:
  • sampled_list (List[Dict[str, Any]])

  • include (set[str] | None)

  • drop_non_spatial_arrays (bool)

dapper.surf.sfile.customize_surface(src_path, customizations, nc_out=None, *, strict_registry=True, allow_add=True, run_validation=False, validator_kwargs=None, units_policy='enforce', engine='netcdf4')[source]

Update or add parameters in an existing ELM surface NetCDF (path-only API).

Parameters:
  • src_path (str | Path) – Path to existing surface NetCDF.

  • customizations (dict) –

    Mapping of variable -> value OR variable -> spec dict:
    • value can be: scalar, np.ndarray, xr.DataArray (broadcasted)

    • spec dict keys:
      {“value”: <required>,

      ”dims”: [“optional dim names for 1D arrays”], “dtype”: “optional dtype override (e.g., ‘float32’)”, “units”: “optional units override (if not enforced by registry)”}

    Notes:
    • For existing variables, dims are taken from the file and ‘dims’ is ignored (value must be broadcastable to that shape).

    • For NEW variables (not in file):
      • If present in REGISTRY, dims/dtype/units come from REGISTRY. All of those dims must already exist in the dataset (sizes are reused).

      • If NOT in REGISTRY and strict_registry=True -> error.

      • If NOT in REGISTRY and strict_registry=False -> you must pass a spec dict with ‘dims’, ‘dtype’, and ‘units’.

  • nc_out (str | Path, optional) – Output path; default is ‘<stem>_custom.nc’ next to input.

  • strict_registry (bool) – Require variables to exist in schema.REGISTRY. True recommended.

  • validate_units (bool) – Ensure file units match registry units (registry ‘’/’varies’ are skipped).

  • allow_add (bool) – Permit adding new variables; otherwise only overwrite existing ones.

  • run_validation (bool) – If True, run dapper.surf.validate.SurfaceValidator on the written file and return the report.

  • validator_kwargs (dict) – Passed to SurfaceValidator(…).

  • units_policy (str)

  • engine (str)

Return type:

(out_path, report_df_or_None)

Raises:

CustomizeError on shape/dtype/units/dim mismatches.

dapper.surf.sfile.write_surface_nc(ds, out_path, *, append_attrs=None, dapper_attrs=None, add_created_utc=True)[source]

Write a surface Dataset to NetCDF with ELM-friendly defaults and merged attributes.

Return type:

str

Parameters:
  • ds (Dataset)

  • out_path (str)

  • append_attrs (dict | None)

  • dapper_attrs (dict | None)

  • add_created_utc (bool)