dapper.surf.sfile¶
Surface file construction, customization, and validation helpers.
Functions
|
Turn a sampled dict (from sample_point_values) into a 1x1 ELM surface xarray.Dataset. |
|
Build an ELM surface xarray.Dataset for a cellset laid out as (nj=N, ni=1). |
|
Update or add parameters in an existing ELM surface NetCDF (path-only API). |
|
Write a surface Dataset to NetCDF with ELM-friendly defaults and merged attributes. |
Classes
|
Unified interface for building and editing ELM/ELM-style surface files. |
Exceptions
Raised when a customization fails schema/formatting validation. |
- exception dapper.surf.sfile.CustomizeError[source]¶
Bases:
ValueErrorRaised when a customization fails schema/formatting validation.
- class dapper.surf.sfile.SurfaceFile(ds, registry=None)[source]¶
Bases:
objectUnified interface for building and editing ELM/ELM-style surface files.
Wraps an in-memory xarray.Dataset (
self.ds).Knows about the surface-variable registry (
dapper.surf.schema;SC.REGISTRY).Can be constructed from:
an existing NetCDF path (
from_netcdf)a point sampled from the global half-degree surface (
from_halfdegree_point)a Domain (
from_domain); currently a light stub you can extend
Parameters are added via
add_params_from_df. That method:creates the named dimension if it does not exist yet, using the distinct values of
id_colfrom the DataFrameadds/overwrites 1D variables whose names come directly from DataFrame column names (except
id_colanddrop_cols)
- Parameters:
ds (Dataset)
registry (Dict[str, ParDef] | None)
- add_params_from_df(dim_name, df, id_col, *, drop_cols=None)[source]¶
Attach / update 1D parameters along dim_name using a DataFrame.
- Parameters:
dim_name (str) – Logical dimension name (e.g. “topounit”, “pft”).
df (pandas.DataFrame or geopandas.GeoDataFrame) – Must contain id_col and one column per parameter.
id_col (str) – Column containing the IDs. The distinct values (as strings), in order of appearance, are used as coordinates if dim_name does not already exist.
drop_cols (list[str], optional) – Columns to ignore as parameters (e.g. “geometry”).
- Return type:
None
- add_topounits_from_domain(domain, *, gid_col='gid', id_col='topounit_id', pct_col='TopounitPctOfCell', dim_name='topounit', pct_var_name='PCT_TOPUNIT')[source]¶
Attach topounits + per-cell weights to the surface dataset.
Expects domain.topounits to exist and contain: - gid_col (links topounit -> cell gid) - id_col (unique id per topounit across the whole run) - pct_col (percent of the parent cell; sums to ~100 per gid)
- Return type:
None- Parameters:
gid_col (str)
id_col (str)
pct_col (str)
dim_name (str)
pct_var_name (str)
- basic_registry_check()[source]¶
Quick registry sanity check.
- Return type:
Dict[str,set[str]]- Returns:
{ – “known”: set of vars present in REGISTRY, “unknown”: set of vars NOT present in REGISTRY,
}
- drop_params(names)[source]¶
Drop one or more data variables from the surface dataset.
- Return type:
None- Parameters:
names (str | List[str])
- classmethod export(domain, *, out_dir, src_path, filename='surfdata.nc', overwrite=False, append_attrs=None, decode_times=True, chunks=None, include=None, exclude=None, registry=None, attach_topounits=True, sampling_method='nearest', lon_wrap='auto', agg_policy=None, validate=False, validator_kwargs=None)[source]¶
Export surface file(s) for a Domain.
Returns: dict[run_id, path]
domain.mode=’cellset’: one file in out_dir
domain.mode=’sites’ : one file per site in out_dir/<gid>/
- Return type:
Dict[str,Path]- Parameters:
domain (Any)
out_dir (str | Path)
src_path (str | Path)
filename (str)
overwrite (bool)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
registry (Dict[str, ParDef] | None)
attach_topounits (bool)
sampling_method (Literal['nearest', 'zonal'])
lon_wrap (Literal['auto', '0_360', '-180_180'])
agg_policy (dict[str, str] | None)
validate (bool)
validator_kwargs (Dict[str, Any] | None)
- classmethod from_domain(domain, src_path, *, decode_times=True, chunks=None, include=None, exclude=None, registry=None, attach_topounits=True, sampling_method='nearest', lon_wrap='auto', agg_policy=None)[source]¶
Sample a global surface Dataset for a single-run Domain and return a SurfaceFile.
- Return type:
- Parameters:
domain (Any)
src_path (str | Path)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
registry (Dict[str, ParDef] | None)
attach_topounits (bool)
sampling_method (Literal['nearest', 'zonal'])
lon_wrap (Literal['auto', '0_360', '-180_180'])
agg_policy (dict[str, str] | None)
- classmethod from_halfdegree_point(lat, lon, *, src_path, decode_times=True, chunks=None, include=None, exclude=None, registry=None)[source]¶
Sample the global half-degree surface at (lat, lon) and return a 1x1 SurfaceFile. Uses dapper.geo.sampling.sample_gridded_dataset_points.
- Return type:
- Parameters:
lat (float)
lon (float)
src_path (str | Path)
decode_times (bool)
chunks (Dict[str, int] | None)
include (set[str] | None)
exclude (set[str] | None)
registry (Dict[str, ParDef] | None)
- classmethod from_netcdf(path, registry=None, decode_times=True)[source]¶
Workflow A: wrap an existing surface file for editing.
- Return type:
- Parameters:
path (str | Path)
registry (Dict[str, ParDef] | None)
decode_times (bool)
- resize_dim(dim_name, new_size, *, fill_value=nan)[source]¶
Generic “change dimensionality” helper (e.g. nlevsoi 10 → 15).
If new_size < old_size: truncate all vars using that dim.
If new_size > old_size: pad with fill_value.
This is intentionally generic; you can wrap ELM-specific logic (e.g. updating ‘nlevsoi’ scalar) on top of it.
- Return type:
None- Parameters:
dim_name (str)
new_size (int)
fill_value (float)
- set_global_attrs(**attrs)[source]¶
Update global attributes on the underlying Dataset.
- Return type:
None- Parameters:
attrs (Any)
- set_scalar(name, value)[source]¶
Convenience for setting scalar parameters like nlevsoi, numrad, etc.
- Return type:
None- Parameters:
name (str)
value (ndarray | DataArray | float | int)
- to_netcdf(path, overwrite=False, encoding=None, append_attrs=None, dapper_attrs=None, add_created_utc=True)[source]¶
Write this SurfaceFile to disk as NetCDF.
- Return type:
str- Parameters:
path (str | Path)
overwrite (bool)
encoding (Dict[str, Dict[str, Any]] | None)
append_attrs (dict | None)
dapper_attrs (dict | None)
add_created_utc (bool)
- validate(strict=False, use_external_validator=False, validator_kwargs=None)[source]¶
Validate the surface Dataset.
- strict=False, use_external_validator=False:
only run basic_registry_check; print a warning for unknown vars.
- use_external_validator=True:
write to a temporary file and run SurfaceValidator on it; return the pandas.DataFrame report.
- Parameters:
strict (bool)
use_external_validator (bool)
validator_kwargs (Dict[str, Any] | None)
- dapper.surf.sfile.build_surface_dataset(sampled, *, include=None, drop_non_spatial_arrays=False)[source]¶
Turn a sampled dict (from sample_point_values) into a 1x1 ELM surface xarray.Dataset. Adds spatial dims back as length-1 and preserves other dims in file order.
- Return type:
Dataset- Parameters:
sampled (Dict[str, Any])
include (set[str] | None)
drop_non_spatial_arrays (bool)
- dapper.surf.sfile.build_surface_dataset_cellset(sampled_list, *, include=None, drop_non_spatial_arrays=False)[source]¶
Build an ELM surface xarray.Dataset for a cellset laid out as (nj=N, ni=1). This mirrors your domain writer default of N×1, and keeps spatial dims last.
Each entry of sampled_list is the dict returned by SurfacePointSampler.sample().
- Return type:
Dataset- Parameters:
sampled_list (List[Dict[str, Any]])
include (set[str] | None)
drop_non_spatial_arrays (bool)
- dapper.surf.sfile.customize_surface(src_path, customizations, nc_out=None, *, strict_registry=True, allow_add=True, run_validation=False, validator_kwargs=None, units_policy='enforce', engine='netcdf4')[source]¶
Update or add parameters in an existing ELM surface NetCDF (path-only API).
- Parameters:
src_path (str | Path) – Path to existing surface NetCDF.
customizations (dict) –
- Mapping of variable -> value OR variable -> spec dict:
value can be: scalar, np.ndarray, xr.DataArray (broadcasted)
- spec dict keys:
- {“value”: <required>,
”dims”: [“optional dim names for 1D arrays”], “dtype”: “optional dtype override (e.g., ‘float32’)”, “units”: “optional units override (if not enforced by registry)”}
- Notes:
For existing variables, dims are taken from the file and ‘dims’ is ignored (value must be broadcastable to that shape).
- For NEW variables (not in file):
If present in REGISTRY, dims/dtype/units come from REGISTRY. All of those dims must already exist in the dataset (sizes are reused).
If NOT in REGISTRY and strict_registry=True -> error.
If NOT in REGISTRY and strict_registry=False -> you must pass a spec dict with ‘dims’, ‘dtype’, and ‘units’.
nc_out (str | Path, optional) – Output path; default is ‘<stem>_custom.nc’ next to input.
strict_registry (bool) – Require variables to exist in schema.REGISTRY. True recommended.
validate_units (bool) – Ensure file units match registry units (registry ‘’/’varies’ are skipped).
allow_add (bool) – Permit adding new variables; otherwise only overwrite existing ones.
run_validation (bool) – If True, run dapper.surf.validate.SurfaceValidator on the written file and return the report.
validator_kwargs (dict) – Passed to SurfaceValidator(…).
units_policy (str)
engine (str)
- Return type:
(out_path, report_df_or_None)
- Raises:
CustomizeError on shape/dtype/units/dim mismatches. –
- dapper.surf.sfile.write_surface_nc(ds, out_path, *, append_attrs=None, dapper_attrs=None, add_created_utc=True)[source]¶
Write a surface Dataset to NetCDF with ELM-friendly defaults and merged attributes.
- Return type:
str- Parameters:
ds (Dataset)
out_path (str)
append_attrs (dict | None)
dapper_attrs (dict | None)
add_created_utc (bool)