dapper.met.adapters package¶
Submodules¶
dapper.met.adapters.base module¶
dapper module: met.adapters.base.
- class dapper.met.adapters.base.BaseAdapter[source]¶
Bases:
ABCAdapter contract for met sources. The Exporter depends only on this interface. Implementations may override defaults as needed.
- abstractmethod discover_files(csv_directory, calendar)[source]¶
Return (csv_files, start_year, end_year).
- Parameters:
calendar (str)
- normalize_locations(df_loc, id_col=None)[source]¶
Standardize df_loc to include [‘gid’,’lat’,’lon’,’lon_0-360’,’zone’], sorted by (lat, lon).
Zones are treated as a per-location grouping label (e.g., to support E3SM/ELM decomposition hints across a cellset). If df_loc lacks a ‘zone’ column, we default every location to zone=1.
We intentionally do not auto-expand locations across multiple zones. If you want multiple zones, supply an explicit ‘zone’ column in df_loc with one row per location.
- Return type:
DataFrame- Parameters:
df_loc (DataFrame)
- pack_params(elm_var, data=None)[source]¶
Default: use elm_utils.elm_var_packing_params if available; otherwise a safe fallback. Adapters can override for source-specific packing strategies.
- Parameters:
elm_var (str)
dapper.met.adapters.era5 module¶
ERA5-Land adapter implementation.
- class dapper.met.adapters.era5.ERA5Adapter[source]¶
Bases:
BaseAdapterERA5-Land → ELM adapter.
This adapter implements the
BaseAdapterinterface for ERA5-Land hourly data. It handles source-specific details—file discovery, unit conversions, humidity diagnostics, renaming to ELM short names, and nonnegativity enforcement, so the upstreamExportercan remain source-agnostic.Responsibilities¶
discover_files: Find CSV shards in a directory and infer the overall (start_year, end_year) using their date coverage.
normalize_locations: Validate and normalize the locations table (adds
lon_0-360, ensures/createszone, stable sorting).id_column_for_csv: Declare the identifier column name in the input CSVs. For ERA5 we require
gid.preprocess_shard: Convert one merged shard (CSV rows joined to locations) into canonical ELM columns. Steps include:
time filtering and optional “noleap” removal of Feb 29
ERA5→ELM unit conversions (e.g., J/hr/m² → W/m², m/hr → mm/s)
optional humidity computation (RH/Q) if temperature, dewpoint, and surface pressure are available
renaming raw ERA5 fields to ELM short names via a mapping
clipping canonical nonnegative variables
returning only required columns in a deterministic order
required_vars: Report the canonical ELM variable names required for the requested output format.
pack_params: Provide robust
(add_offset, scale_factor)for a canonical ELM variable, given optional data to tune ranges.
Notes
Humidity computation is performed only when
temperature_2m,dewpoint_temperature_2m, andsurface_pressureare present.Precipitation conversion uses
m/hr → mm/svia division by3.6.
- DRIVER_TAG = 'ERA5'¶
- SOURCE_NAME = 'ERA5-Land hourly reanalysis'¶
- discover_files(csv_directory, calendar)[source]¶
Discover ERA5 CSV shards in a directory and infer the inclusive year range.
- id_column_for_csv(df_csv, id_col)[source]¶
Return the required identifier column name expected in ERA5 CSV shards (“gid”).
- pack_params(elm_var, data=None)[source]¶
Return (add_offset, scale_factor) used to pack a variable for NetCDF output.
- preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]¶
Filter time & handle no-leap
Apply ERA5 → ELM unit conversions
Compute humidities (if columns available)
Rename columns to canonical ELM names using RAW_TO_ELM
Clip canonical nonnegative variables
Return only the canonical vars required by elm_required_vars(dformat), plus LONGXY/LATIXY/time/gid/zone (coords/meta).
dapper.met.adapters.fluxnet module¶
dapper module: met.adapters.fluxnet.
- class dapper.met.adapters.fluxnet.FluxnetAdapter[source]¶
Bases:
BaseAdapterAmeriFlux FLUXNET → ELM adapter.
Assumptions¶
User provides a single FLUXNET CSV (FULLSET or SUBSET) per run.
CSV contains TIMESTAMP_START/TIMESTAMP_END (or TIMESTAMP) columns.
Missing values are coded as -9999.
Exporter supplies df_merged with [‘gid’,’lat’,’lon’,’zone’, …] already merged in from df_loc.
- DRIVER_TAG = 'FLUXNET'¶
- SOURCE_NAME = 'FLUXNET (AmeriFlux ONEFlux) tower data'¶
- discover_files(csv_directory, calendar)[source]¶
Return (csv_files, start_year, end_year).
- Parameters:
calendar (str)
- dapper.met.adapters.fluxnet.infer_fluxnet_dt_hours(df)[source]¶
Infer native FLUXNET timestep from timestamp columns in hours.
- Handles:
half-hourly/hourly/weekly: TIMESTAMP_START, TIMESTAMP_END (YYYYMMDDHHMM)
daily/monthly/yearly: TIMESTAMP (YYYYMMDD or YYYYMM, etc.)
- Returns:
Approximate timestep in hours.
- Return type:
float
- Raises:
ValueError – If no suitable timestamp columns are found or dt cannot be inferred.
- Parameters:
df (DataFrame)
Module contents¶
MET adapters.
This package contains dataset-specific adapters (ERA5, Fluxnet, …).
The source-agnostic exporter lives in dapper.met.exporter, but we
re-export it here for convenience/back-compat.
- class dapper.met.adapters.BaseAdapter[source]¶
Bases:
ABCAdapter contract for met sources. The Exporter depends only on this interface. Implementations may override defaults as needed.
- abstractmethod discover_files(csv_directory, calendar)[source]¶
Return (csv_files, start_year, end_year).
- Parameters:
calendar (str)
- normalize_locations(df_loc, id_col=None)[source]¶
Standardize df_loc to include [‘gid’,’lat’,’lon’,’lon_0-360’,’zone’], sorted by (lat, lon).
Zones are treated as a per-location grouping label (e.g., to support E3SM/ELM decomposition hints across a cellset). If df_loc lacks a ‘zone’ column, we default every location to zone=1.
We intentionally do not auto-expand locations across multiple zones. If you want multiple zones, supply an explicit ‘zone’ column in df_loc with one row per location.
- Return type:
DataFrame- Parameters:
df_loc (DataFrame)
- pack_params(elm_var, data=None)[source]¶
Default: use elm_utils.elm_var_packing_params if available; otherwise a safe fallback. Adapters can override for source-specific packing strategies.
- Parameters:
elm_var (str)
- class dapper.met.adapters.ERA5Adapter[source]¶
Bases:
BaseAdapterERA5-Land → ELM adapter.
This adapter implements the
BaseAdapterinterface for ERA5-Land hourly data. It handles source-specific details—file discovery, unit conversions, humidity diagnostics, renaming to ELM short names, and nonnegativity enforcement, so the upstreamExportercan remain source-agnostic.Responsibilities¶
discover_files: Find CSV shards in a directory and infer the overall (start_year, end_year) using their date coverage.
normalize_locations: Validate and normalize the locations table (adds
lon_0-360, ensures/createszone, stable sorting).id_column_for_csv: Declare the identifier column name in the input CSVs. For ERA5 we require
gid.preprocess_shard: Convert one merged shard (CSV rows joined to locations) into canonical ELM columns. Steps include:
time filtering and optional “noleap” removal of Feb 29
ERA5→ELM unit conversions (e.g., J/hr/m² → W/m², m/hr → mm/s)
optional humidity computation (RH/Q) if temperature, dewpoint, and surface pressure are available
renaming raw ERA5 fields to ELM short names via a mapping
clipping canonical nonnegative variables
returning only required columns in a deterministic order
required_vars: Report the canonical ELM variable names required for the requested output format.
pack_params: Provide robust
(add_offset, scale_factor)for a canonical ELM variable, given optional data to tune ranges.
Notes
Humidity computation is performed only when
temperature_2m,dewpoint_temperature_2m, andsurface_pressureare present.Precipitation conversion uses
m/hr → mm/svia division by3.6.
- DRIVER_TAG = 'ERA5'¶
- SOURCE_NAME = 'ERA5-Land hourly reanalysis'¶
- discover_files(csv_directory, calendar)[source]¶
Discover ERA5 CSV shards in a directory and infer the inclusive year range.
- id_column_for_csv(df_csv, id_col)[source]¶
Return the required identifier column name expected in ERA5 CSV shards (“gid”).
- pack_params(elm_var, data=None)[source]¶
Return (add_offset, scale_factor) used to pack a variable for NetCDF output.
- preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]¶
Filter time & handle no-leap
Apply ERA5 → ELM unit conversions
Compute humidities (if columns available)
Rename columns to canonical ELM names using RAW_TO_ELM
Clip canonical nonnegative variables
Return only the canonical vars required by elm_required_vars(dformat), plus LONGXY/LATIXY/time/gid/zone (coords/meta).
- class dapper.met.adapters.Exporter(adapter, src_path, *, domain, out_dir=None, calendar='noleap', dtime_resolution_hrs=1, dtime_units='days', dformat='BYPASS', append_attrs=None, chunks=None, include_vars=None, exclude_vars=None)[source]¶
Bases:
objectSource-agnostic meteorological exporter.
This class orchestrates a two-pass pipeline that ingests time-sharded CSVs for many sites/cells, preprocesses them via a pluggable adapter, and writes ELM-ready NetCDF outputs in two layouts:
"cellset"– one NetCDF per variable with dims('DTIME','lat','lon')(global packing; sparse lat/lon axes are OK)."sites"– one directory per site; each directory contains one NetCDF per variable with dims('n','DTIME')wheren=1(per-site packing).
Exporter is source-agnostic: all dataset-specific logic (file discovery, unit conversions, renaming to ELM short names, etc.) lives in an adapter that implements the BaseAdapter interface (e.g., an
ERA5Adapter). The exporter handles staging (CSV → per-site parquet), global DTIME axis creation, packing scans, chunking, and NetCDF I/O.- Parameters:
adapter (BaseAdapter) – Implements:
discover_files,normalize_locations,preprocess_shard,required_vars, andpack_params.csv_directory (str or pathlib.Path) – Directory containing time-sharded CSV files for all sites/cells.
out_dir (str or pathlib.Path) – Destination directory for NetCDF outputs and temporary parquet shards.
df_loc (pandas.DataFrame) – Locations table with at least columns
["gid","lat","lon"]; optional"zone". The adapter’snormalize_locations: - validates columns, - adds"lon_0-360", - fills/validates"zone", - sorts for stable site order.id_col (str, optional) – Kept for backward compatibility (unused when
"gid"is assumed).calendar ({"noleap","standard"}, default "noleap") – Calendar for numeric DTIME coordinate; Feb 29 filtered for “noleap”.
dtime_resolution_hrs (int, default 1) – Target time resolution in hours for the DTIME axis.
dtime_units ({"days","hours"}, default "days") – Units of the numeric DTIME coordinate (e.g.,
"days since YYYY-MM-DD HH:MM:SS").domain (Domain)
dformat (str)
append_attrs (dict | None)
- dformat{“BYPASS”,”DATM_MODE”}, default “BYPASS”
Target ELM format selector passed through to the adapter.
- append_attrsdict, optional
Extra global NetCDF attributes to include in every file. The exporter also adds:
export_mode("cellset"or"sites") andpack_scope("global"or"per-site").- chunkstuple[int,…], optional
Explicit NetCDF chunk sizes.
- include_vars / exclude_varsIterable[str], optional
Allow-/block-lists of ELM short names applied after preprocess. Meta columns
{"gid","time","LATIXY","LONGXY","zone"}are always kept.
Side Effects¶
Creates a temporary directory of per-site parquet shards under
out_dir.Writes NetCDF files to
out_dirin the chosen layout.Writes a
zone_mappings.txtfile either at the root (cellset) or inside each site directory (sites).
Notes
Packing: global packing for
cellset; per-site packing forsites.Required columns: CSV shards and
df_locboth use"gid"; CSVs include the adapter’s date/time column (renamed to"time"during preprocess).Combined (lat/lon) layout: does not enforce regular grids; axes are the unique sorted lat/lon from
df_loc(sparse OK).
- run(*, pack_scope=None, filename=None, overwrite=False)[source]¶
Run the MET export for this exporter’s Domain.
- The output layout is derived from
Domain.mode: sites: writes<run_dir>/<gid>/MET/{prefix_}{var}.ncand a per-sitezone_mappings.txt(always zone=01, id=1).cellset: writes<run_dir>/MET/{prefix_}{var}.ncand a singlezone_mappings.txtcovering all locations (zones taken from df_loc, default 1).
- Parameters:
pack_scope – Optional packing strategy override. Defaults to
per-sitefor sites andglobalfor cellset outputs.filename (
str|None) – Optional filename prefix for output NetCDF files. If provided, each variable is written to{filename}_{var}.nc.overwrite (
bool) – If True, clears existing MET outputs before writing.
- Return type:
None
- The output layout is derived from
- class dapper.met.adapters.FluxnetAdapter[source]¶
Bases:
BaseAdapterAmeriFlux FLUXNET → ELM adapter.
Assumptions¶
User provides a single FLUXNET CSV (FULLSET or SUBSET) per run.
CSV contains TIMESTAMP_START/TIMESTAMP_END (or TIMESTAMP) columns.
Missing values are coded as -9999.
Exporter supplies df_merged with [‘gid’,’lat’,’lon’,’zone’, …] already merged in from df_loc.
- DRIVER_TAG = 'FLUXNET'¶
- SOURCE_NAME = 'FLUXNET (AmeriFlux ONEFlux) tower data'¶
- discover_files(csv_directory, calendar)[source]¶
Return (csv_files, start_year, end_year).
- Parameters:
calendar (str)
-
native_dt_hours:
Optional[float]¶
- preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]¶
Return a DataFrame with at least: [‘gid’,’time’,’LATIXY’,’LONGXY’,’zone’, <ELM vars>]
- Return type:
DataFrame- Parameters:
df_merged (DataFrame)
start_year (int)
end_year (int)
calendar (str)
dformat (str)
- required_vars(dformat)[source]¶
Optional: return a list of ELM var short names that this adapter will produce for the given dformat (‘BYPASS’ or ‘DATM_MODE’). Exporter doesn’t require it.
- Parameters:
dformat (str)
-
resolution:
Optional[str]¶