dapper.met.adapters package¶

Submodules¶

dapper.met.adapters.base module¶

dapper module: met.adapters.base.

class dapper.met.adapters.base.BaseAdapter[source]¶

Bases: ABC

Adapter contract for met sources. The Exporter depends only on this interface. Implementations may override defaults as needed.

abstractmethod discover_files(csv_directory, calendar)[source]¶

Return (csv_files, start_year, end_year).

Parameters:: calendar (str)

normalize_locations(df_loc, id_col=None)[source]¶

Standardize df_loc to include [‘gid’,’lat’,’lon’,’lon_0-360’,’zone’], sorted by (lat, lon).

Zones are treated as a per-location grouping label (e.g., to support E3SM/ELM decomposition hints across a cellset). If df_loc lacks a ‘zone’ column, we default every location to zone=1.

We intentionally do not auto-expand locations across multiple zones. If you want multiple zones, supply an explicit ‘zone’ column in df_loc with one row per location.

Return type:: DataFrame
Parameters:: df_loc (DataFrame)

pack_params(elm_var, data=None)[source]¶

Default: use elm_utils.elm_var_packing_params if available; otherwise a safe fallback. Adapters can override for source-specific packing strategies.

Parameters:: elm_var (str)

abstractmethod preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]¶

Return a DataFrame with at least: [‘gid’,’time’,’LATIXY’,’LONGXY’,’zone’, <ELM vars>]

Return type:

DataFrame

Parameters:

df_merged (DataFrame)
start_year (int)
end_year (int)
calendar (str)
dformat (str)

required_vars(dformat)[source]¶

Optional: return a list of ELM var short names that this adapter will produce for the given dformat (‘BYPASS’ or ‘DATM_MODE’). Exporter doesn’t require it.

Parameters:: dformat (str)

dapper.met.adapters.era5 module¶

ERA5-Land adapter implementation.

class dapper.met.adapters.era5.ERA5Adapter[source]¶

Bases: BaseAdapter

ERA5-Land → ELM adapter.

This adapter implements the BaseAdapter interface for ERA5-Land hourly data. It handles source-specific details—file discovery, unit conversions, humidity diagnostics, renaming to ELM short names, and nonnegativity enforcement, so the upstream Exporter can remain source-agnostic.

Responsibilities¶

discover_files: Find CSV shards in a directory and infer the overall (start_year, end_year) using their date coverage.
normalize_locations: Validate and normalize the locations table (adds lon_0-360, ensures/creates zone, stable sorting).
id_column_for_csv: Declare the identifier column name in the input CSVs. For ERA5 we require gid.
preprocess_shard: Convert one merged shard (CSV rows joined to locations) into canonical ELM columns. Steps include:
1. time filtering and optional “noleap” removal of Feb 29
2. ERA5→ELM unit conversions (e.g., J/hr/m² → W/m², m/hr → mm/s)
3. optional humidity computation (RH/Q) if temperature, dewpoint, and surface pressure are available
4. renaming raw ERA5 fields to ELM short names via a mapping
5. clipping canonical nonnegative variables
6. returning only required columns in a deterministic order
required_vars: Report the canonical ELM variable names required for the requested output format.
pack_params: Provide robust (add_offset, scale_factor) for a canonical ELM variable, given optional data to tune ranges.

Notes

Humidity computation is performed only when temperature_2m, dewpoint_temperature_2m, and surface_pressure are present.
Precipitation conversion uses m/hr → mm/s via division by 3.6.

DRIVER_TAG = 'ERA5'¶

SOURCE_NAME = 'ERA5-Land hourly reanalysis'¶

discover_files(csv_directory, calendar)[source]¶: Discover ERA5 CSV shards in a directory and infer the inclusive year range.

id_column_for_csv(df_csv, id_col)[source]¶: Return the required identifier column name expected in ERA5 CSV shards (“gid”).

pack_params(elm_var, data=None)[source]¶: Return (add_offset, scale_factor) used to pack a variable for NetCDF output.

preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]¶

Filter time & handle no-leap
Apply ERA5 → ELM unit conversions
Compute humidities (if columns available)
Rename columns to canonical ELM names using RAW_TO_ELM
Clip canonical nonnegative variables
Return only the canonical vars required by elm_required_vars(dformat), plus LONGXY/LATIXY/time/gid/zone (coords/meta).

required_vars(dformat)[source]¶: Return the canonical ELM variables required for the requested output format.

dapper.met.adapters.fluxnet module¶

dapper module: met.adapters.fluxnet.

class dapper.met.adapters.fluxnet.FluxnetAdapter[source]¶

Bases: BaseAdapter

AmeriFlux FLUXNET → ELM adapter.

Assumptions¶

User provides a single FLUXNET CSV (FULLSET or SUBSET) per run.
CSV contains TIMESTAMP_START/TIMESTAMP_END (or TIMESTAMP) columns.
Missing values are coded as -9999.
Exporter supplies df_merged with [‘gid’,’lat’,’lon’,’zone’, …] already merged in from df_loc.

DRIVER_TAG = 'FLUXNET'¶

SOURCE_NAME = 'FLUXNET (AmeriFlux ONEFlux) tower data'¶

discover_files(csv_directory, calendar)[source]¶

Return (csv_files, start_year, end_year).

Parameters:: calendar (str)

preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]¶

Return a DataFrame with at least: [‘gid’,’time’,’LATIXY’,’LONGXY’,’zone’, <ELM vars>]

Return type:

DataFrame

Parameters:

df_merged (DataFrame)
start_year (int)
end_year (int)
calendar (str)
dformat (str)

required_vars(dformat)[source]¶

Optional: return a list of ELM var short names that this adapter will produce for the given dformat (‘BYPASS’ or ‘DATM_MODE’). Exporter doesn’t require it.

Parameters:: dformat (str)

dapper.met.adapters.fluxnet.infer_fluxnet_dt_hours(df)[source]¶

Infer native FLUXNET timestep from timestamp columns in hours.

Handles:

half-hourly/hourly/weekly: TIMESTAMP_START, TIMESTAMP_END (YYYYMMDDHHMM)
daily/monthly/yearly: TIMESTAMP (YYYYMMDD or YYYYMM, etc.)

Returns:: Approximate timestep in hours.
Return type:: float
Raises:: ValueError – If no suitable timestamp columns are found or dt cannot be inferred.
Parameters:: df (DataFrame)

Module contents¶

MET adapters.

This package contains dataset-specific adapters (ERA5, Fluxnet, …).

The source-agnostic exporter lives in dapper.met.exporter, but we re-export it here for convenience/back-compat.

class dapper.met.adapters.BaseAdapter[source]¶

Bases: ABC

Adapter contract for met sources. The Exporter depends only on this interface. Implementations may override defaults as needed.

abstractmethod discover_files(csv_directory, calendar)[source]¶

Return (csv_files, start_year, end_year).

Parameters:: calendar (str)

normalize_locations(df_loc, id_col=None)[source]¶

Standardize df_loc to include [‘gid’,’lat’,’lon’,’lon_0-360’,’zone’], sorted by (lat, lon).

Zones are treated as a per-location grouping label (e.g., to support E3SM/ELM decomposition hints across a cellset). If df_loc lacks a ‘zone’ column, we default every location to zone=1.

We intentionally do not auto-expand locations across multiple zones. If you want multiple zones, supply an explicit ‘zone’ column in df_loc with one row per location.

Return type:: DataFrame
Parameters:: df_loc (DataFrame)

pack_params(elm_var, data=None)[source]¶

Default: use elm_utils.elm_var_packing_params if available; otherwise a safe fallback. Adapters can override for source-specific packing strategies.

Parameters:: elm_var (str)

abstractmethod preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]¶

Return a DataFrame with at least: [‘gid’,’time’,’LATIXY’,’LONGXY’,’zone’, <ELM vars>]

Return type:

DataFrame

Parameters:

df_merged (DataFrame)
start_year (int)
end_year (int)
calendar (str)
dformat (str)

required_vars(dformat)[source]¶

Optional: return a list of ELM var short names that this adapter will produce for the given dformat (‘BYPASS’ or ‘DATM_MODE’). Exporter doesn’t require it.

Parameters:: dformat (str)

class dapper.met.adapters.ERA5Adapter[source]¶

Bases: BaseAdapter

ERA5-Land → ELM adapter.

Responsibilities¶

discover_files: Find CSV shards in a directory and infer the overall (start_year, end_year) using their date coverage.
normalize_locations: Validate and normalize the locations table (adds lon_0-360, ensures/creates zone, stable sorting).
id_column_for_csv: Declare the identifier column name in the input CSVs. For ERA5 we require gid.
preprocess_shard: Convert one merged shard (CSV rows joined to locations) into canonical ELM columns. Steps include:
1. time filtering and optional “noleap” removal of Feb 29
2. ERA5→ELM unit conversions (e.g., J/hr/m² → W/m², m/hr → mm/s)
3. optional humidity computation (RH/Q) if temperature, dewpoint, and surface pressure are available
4. renaming raw ERA5 fields to ELM short names via a mapping
5. clipping canonical nonnegative variables
6. returning only required columns in a deterministic order
required_vars: Report the canonical ELM variable names required for the requested output format.
pack_params: Provide robust (add_offset, scale_factor) for a canonical ELM variable, given optional data to tune ranges.

Notes

Humidity computation is performed only when temperature_2m, dewpoint_temperature_2m, and surface_pressure are present.
Precipitation conversion uses m/hr → mm/s via division by 3.6.

DRIVER_TAG = 'ERA5'¶

SOURCE_NAME = 'ERA5-Land hourly reanalysis'¶

discover_files(csv_directory, calendar)[source]¶: Discover ERA5 CSV shards in a directory and infer the inclusive year range.

id_column_for_csv(df_csv, id_col)[source]¶: Return the required identifier column name expected in ERA5 CSV shards (“gid”).

pack_params(elm_var, data=None)[source]¶: Return (add_offset, scale_factor) used to pack a variable for NetCDF output.

preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]¶

Filter time & handle no-leap
Apply ERA5 → ELM unit conversions
Compute humidities (if columns available)
Rename columns to canonical ELM names using RAW_TO_ELM
Clip canonical nonnegative variables
Return only the canonical vars required by elm_required_vars(dformat), plus LONGXY/LATIXY/time/gid/zone (coords/meta).

required_vars(dformat)[source]¶: Return the canonical ELM variables required for the requested output format.

class dapper.met.adapters.Exporter(adapter, src_path, *, domain, out_dir=None, calendar='noleap', dtime_resolution_hrs=1, dtime_units='days', dformat='BYPASS', append_attrs=None, chunks=None, include_vars=None, exclude_vars=None)[source]¶

Bases: object

Source-agnostic meteorological exporter.

This class orchestrates a two-pass pipeline that ingests time-sharded CSVs for many sites/cells, preprocesses them via a pluggable adapter, and writes ELM-ready NetCDF outputs in two layouts:

"cellset" – one NetCDF per variable with dims ('DTIME','lat','lon') (global packing; sparse lat/lon axes are OK).

"sites" – one directory per site; each directory contains one NetCDF per variable with dims ('n','DTIME') where n=1 (per-site packing).

Exporter is source-agnostic: all dataset-specific logic (file discovery, unit conversions, renaming to ELM short names, etc.) lives in an adapter that implements the BaseAdapter interface (e.g., an ERA5Adapter). The exporter handles staging (CSV → per-site parquet), global DTIME axis creation, packing scans, chunking, and NetCDF I/O.

Parameters:

adapter (BaseAdapter) – Implements: discover_files, normalize_locations, preprocess_shard, required_vars, and pack_params.
csv_directory (str or pathlib.Path) – Directory containing time-sharded CSV files for all sites/cells.
out_dir (str or pathlib.Path) – Destination directory for NetCDF outputs and temporary parquet shards.
df_loc (pandas.DataFrame) – Locations table with at least columns ["gid","lat","lon"]; optional "zone". The adapter’s normalize_locations: - validates columns, - adds "lon_0-360", - fills/validates "zone", - sorts for stable site order.
id_col (str, optional) – Kept for backward compatibility (unused when "gid" is assumed).
calendar ({"noleap","standard"}, default "noleap") – Calendar for numeric DTIME coordinate; Feb 29 filtered for “noleap”.
dtime_resolution_hrs (int, default 1) – Target time resolution in hours for the DTIME axis.
dtime_units ({"days","hours"}, default "days") – Units of the numeric DTIME coordinate (e.g., "days since YYYY-MM-DD HH:MM:SS").
domain (Domain)
dformat (str)
append_attrs (dict | None)

dformat{“BYPASS”,”DATM_MODE”}, default “BYPASS”: Target ELM format selector passed through to the adapter.
append_attrsdict, optional: Extra global NetCDF attributes to include in every file. The exporter also adds: export_mode ("cellset" or "sites") and pack_scope ("global" or "per-site").
chunkstuple[int,…], optional: Explicit NetCDF chunk sizes.
include_vars / exclude_varsIterable[str], optional: Allow-/block-lists of ELM short names applied after preprocess. Meta columns {"gid","time","LATIXY","LONGXY","zone"} are always kept.

Side Effects¶

Creates a temporary directory of per-site parquet shards under out_dir.
Writes NetCDF files to out_dir in the chosen layout.
Writes a zone_mappings.txt file either at the root (cellset) or inside each site directory (sites).

Notes

Packing: global packing for cellset; per-site packing for sites.
Required columns: CSV shards and df_loc both use "gid"; CSVs include the adapter’s date/time column (renamed to "time" during preprocess).
Combined (lat/lon) layout: does not enforce regular grids; axes are the unique sorted lat/lon from df_loc (sparse OK).

run(*, pack_scope=None, filename=None, overwrite=False)[source]¶

Run the MET export for this exporter’s Domain.

The output layout is derived from Domain.mode:

sites: writes <run_dir>/<gid>/MET/{prefix_}{var}.nc and a per-site zone_mappings.txt (always zone=01, id=1).
cellset: writes <run_dir>/MET/{prefix_}{var}.nc and a single zone_mappings.txt covering all locations (zones taken from df_loc, default 1).

Parameters:

pack_scope – Optional packing strategy override. Defaults to per-site for sites and global for cellset outputs.
filename (str | None) – Optional filename prefix for output NetCDF files. If provided, each variable is written to {filename}_{var}.nc.
overwrite (bool) – If True, clears existing MET outputs before writing.

Return type:

None

class dapper.met.adapters.FluxnetAdapter[source]¶

Bases: BaseAdapter

AmeriFlux FLUXNET → ELM adapter.

Assumptions¶

User provides a single FLUXNET CSV (FULLSET or SUBSET) per run.
CSV contains TIMESTAMP_START/TIMESTAMP_END (or TIMESTAMP) columns.
Missing values are coded as -9999.
Exporter supplies df_merged with [‘gid’,’lat’,’lon’,’zone’, …] already merged in from df_loc.

DRIVER_TAG = 'FLUXNET'¶

SOURCE_NAME = 'FLUXNET (AmeriFlux ONEFlux) tower data'¶

discover_files(csv_directory, calendar)[source]¶

Return (csv_files, start_year, end_year).

Parameters:: calendar (str)

native_dt_hours: Optional[float]¶

preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]¶

Return a DataFrame with at least: [‘gid’,’time’,’LATIXY’,’LONGXY’,’zone’, <ELM vars>]

Return type:

DataFrame

Parameters:

df_merged (DataFrame)
start_year (int)
end_year (int)
calendar (str)
dformat (str)

required_vars(dformat)[source]¶

Optional: return a list of ELM var short names that this adapter will produce for the given dformat (‘BYPASS’ or ‘DATM_MODE’). Exporter doesn’t require it.

Parameters:: dformat (str)

resolution: Optional[str]¶