dapper.geo.sampling

Point sampling and lightweight NetCDF helpers.

Functions

ensure_weight(df, *[, weight_col, ...])

Ensure df has a weight column.

infer_grid_metadata(ds, *[, lat_dim, ...])

Infer basic (regular) grid metadata for provenance.

infer_lat_lon_vars(ds)

Prefer ELM-style 2D vars (LATIXY/LONGXY), else fall back to common names.

infer_latlon_spec(ds, *[, lat_dim, lon_dim, ...])

Build a LatLonSpec for fast nearest-neighbor lookup.

nearest_ij(spec, lat, lon)

Nearest-neighbor (i, j) on a regular lat/lon grid (lat_1d, lon_1d).

points_to_nearest_cells(ds, points, *[, ...])

Return a dataframe with (i_lat, i_lon) and the chosen cell center for each point.

sample_gridded_dataset_points(ds, points, *)

Sample all spatial vars (those containing BOTH lat_dim and lon_dim) at the provided point locations.

write_netcdf(ds, out_path, *[, compress, ...])

Write a Dataset to NetCDF with optional encoding and attribute handling.

Classes

LatLonSpec(lat_var, lon_var, lat_dim, ...)

Minimal spec for mapping (lat, lon) -> (i, j) in a gridded dataset.

class dapper.geo.sampling.LatLonSpec(lat_var, lon_var, lat_dim, lon_dim, lon_wrap, lat_1d, lon_1d)[source]

Bases: object

Minimal spec for mapping (lat, lon) -> (i, j) in a gridded dataset.

Parameters:
  • lat_var (str)

  • lon_var (str)

  • lat_dim (str)

  • lon_dim (str)

  • lon_wrap (Literal['0_360', '-180_180'])

  • lat_1d (ndarray)

  • lon_1d (ndarray)

lat_1d: ndarray
lat_dim: str
lat_var: str
lon_1d: ndarray
lon_dim: str
lon_var: str
lon_wrap: Literal['0_360', '-180_180']
dapper.geo.sampling.ensure_weight(df, *, weight_col='weight', default_weight=1.0)[source]

Ensure df has a weight column. Returns a COPY if it needs to add the column.

Return type:

DataFrame

Parameters:
  • df (DataFrame)

  • weight_col (str)

  • default_weight (float)

dapper.geo.sampling.infer_grid_metadata(ds, *, lat_dim='lsmlat', lon_dim='lsmlon', lat_var=None, lon_var=None, lon_wrap='auto')[source]

Infer basic (regular) grid metadata for provenance.

Returns keys that are safe to stash in global attrs (namespaced with dapper). If resolution can’t be inferred reliably, values may be omitted.

Return type:

dict

Parameters:
  • ds (Dataset)

  • lat_dim (str)

  • lon_dim (str)

  • lat_var (str | None)

  • lon_var (str | None)

  • lon_wrap (Literal['auto', '0_360', '-180_180'])

dapper.geo.sampling.infer_lat_lon_vars(ds)[source]

Prefer ELM-style 2D vars (LATIXY/LONGXY), else fall back to common names.

Return type:

tuple[str, str]

Parameters:

ds (Dataset)

dapper.geo.sampling.infer_latlon_spec(ds, *, lat_dim='lsmlat', lon_dim='lsmlon', lat_var=None, lon_var=None, lon_wrap='auto')[source]

Build a LatLonSpec for fast nearest-neighbor lookup.

Return type:

LatLonSpec

Parameters:
  • ds (Dataset)

  • lat_dim (str)

  • lon_dim (str)

  • lat_var (str | None)

  • lon_var (str | None)

  • lon_wrap (Literal['auto', '0_360', '-180_180'])

Assumptions (fine for your landuse/surf ELM-style files):
  • LATIXY/LONGXY exist as 2D (lat_dim, lon_dim), OR

  • lat/lon exist as 1D vectors.

dapper.geo.sampling.nearest_ij(spec, lat, lon)[source]

Nearest-neighbor (i, j) on a regular lat/lon grid (lat_1d, lon_1d).

Return type:

tuple[int, int]

Parameters:
dapper.geo.sampling.points_to_nearest_cells(ds, points, *, lat_col='lat', lon_col='lon', weight_col='weight', lat_dim='lsmlat', lon_dim='lsmlon', lat_var=None, lon_var=None, lon_wrap='auto')[source]

Return a dataframe with (i_lat, i_lon) and the chosen cell center for each point. Keeps the original weight column.

Return type:

DataFrame

Parameters:
  • ds (Dataset)

  • points (DataFrame)

  • lat_col (str)

  • lon_col (str)

  • weight_col (str)

  • lat_dim (str)

  • lon_dim (str)

  • lat_var (str | None)

  • lon_var (str | None)

  • lon_wrap (Literal['auto', '0_360', '-180_180'])

dapper.geo.sampling.sample_gridded_dataset_points(ds, points, *, lat_col='lat', lon_col='lon', lat_dim='lsmlat', lon_dim='lsmlon', lat_var=None, lon_var=None, lon_wrap='auto', method='nearest', vars_include=None, vars_drop=None)[source]

Sample all spatial vars (those containing BOTH lat_dim and lon_dim) at the provided point locations.

Return type:

Dataset

Parameters:
  • ds (Dataset)

  • points (DataFrame)

  • lat_col (str)

  • lon_col (str)

  • lat_dim (str)

  • lon_dim (str)

  • lat_var (str | None)

  • lon_var (str | None)

  • lon_wrap (Literal['auto', '0_360', '-180_180'])

  • method (Literal['nearest'])

  • vars_include (Sequence[str] | None)

  • vars_drop (Sequence[str] | None)

Output convention:
  • lat_dim has length N (number of points)

  • lon_dim has length 1

  • no coordinate variables are created for lat_dim/lon_dim (matches your Toolik file)

dapper.geo.sampling.write_netcdf(ds, out_path, *, compress=True, complevel=4, append_attrs=None, dapper_attrs=None, add_created_utc=True)[source]

Write a Dataset to NetCDF with optional encoding and attribute handling.

Return type:

Path

Parameters:
  • ds (Dataset)

  • out_path (str | Path)

  • compress (bool)

  • complevel (int)

  • append_attrs (dict | None)

  • dapper_attrs (dict | None)

  • add_created_utc (bool)