dapper.met.adapters package

Submodules

dapper.met.adapters.base module

dapper module: met.adapters.base.

class dapper.met.adapters.base.BaseAdapter[source]

Bases: ABC

Adapter contract for met sources. The Exporter depends only on this interface. Implementations may override defaults as needed.

abstractmethod discover_files(csv_directory, calendar)[source]

Return (csv_files, start_year, end_year).

Parameters:

calendar (str)

normalize_locations(df_loc, id_col=None)[source]

Standardize df_loc to include [‘gid’,’lat’,’lon’,’lon_0-360’,’zone’], sorted by (lat, lon).

Zones are treated as a per-location grouping label (e.g., to support E3SM/ELM decomposition hints across a cellset). If df_loc lacks a ‘zone’ column, we default every location to zone=1.

We intentionally do not auto-expand locations across multiple zones. If you want multiple zones, supply an explicit ‘zone’ column in df_loc with one row per location.

Return type:

DataFrame

Parameters:

df_loc (DataFrame)

pack_params(elm_var, data=None)[source]

Default: use elm_utils.elm_var_packing_params if available; otherwise a safe fallback. Adapters can override for source-specific packing strategies.

Parameters:

elm_var (str)

abstractmethod preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]

Return a DataFrame with at least: [‘gid’,’time’,’LATIXY’,’LONGXY’,’zone’, <ELM vars>]

Return type:

DataFrame

Parameters:
  • df_merged (DataFrame)

  • start_year (int)

  • end_year (int)

  • calendar (str)

  • dformat (str)

required_vars(dformat)[source]

Optional: return a list of ELM var short names that this adapter will produce for the given dformat (‘BYPASS’ or ‘DATM_MODE’). Exporter doesn’t require it.

Parameters:

dformat (str)

dapper.met.adapters.era5 module

ERA5-Land adapter implementation.

class dapper.met.adapters.era5.ERA5Adapter[source]

Bases: BaseAdapter

ERA5-Land → ELM adapter.

This adapter implements the BaseAdapter interface for ERA5-Land hourly data. It handles source-specific details—file discovery, unit conversions, humidity diagnostics, renaming to ELM short names, and nonnegativity enforcement, so the upstream Exporter can remain source-agnostic.

Responsibilities

  • discover_files: Find CSV shards in a directory and infer the overall (start_year, end_year) using their date coverage.

  • normalize_locations: Validate and normalize the locations table (adds lon_0-360, ensures/creates zone, stable sorting).

  • id_column_for_csv: Declare the identifier column name in the input CSVs. For ERA5 we require gid.

  • preprocess_shard: Convert one merged shard (CSV rows joined to locations) into canonical ELM columns. Steps include:

    1. time filtering and optional “noleap” removal of Feb 29

    2. ERA5→ELM unit conversions (e.g., J/hr/m² → W/m², m/hr → mm/s)

    3. optional humidity computation (RH/Q) if temperature, dewpoint, and surface pressure are available

    4. renaming raw ERA5 fields to ELM short names via a mapping

    5. clipping canonical nonnegative variables

    6. returning only required columns in a deterministic order

  • required_vars: Report the canonical ELM variable names required for the requested output format.

  • pack_params: Provide robust (add_offset, scale_factor) for a canonical ELM variable, given optional data to tune ranges.

Notes

  • Humidity computation is performed only when temperature_2m, dewpoint_temperature_2m, and surface_pressure are present.

  • Precipitation conversion uses m/hr mm/s via division by 3.6.

DRIVER_TAG = 'ERA5'
SOURCE_NAME = 'ERA5-Land hourly reanalysis'
discover_files(csv_directory, calendar)[source]

Discover ERA5 CSV shards in a directory and infer the inclusive year range.

id_column_for_csv(df_csv, id_col)[source]

Return the required identifier column name expected in ERA5 CSV shards (“gid”).

pack_params(elm_var, data=None)[source]

Return (add_offset, scale_factor) used to pack a variable for NetCDF output.

preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]
  1. Filter time & handle no-leap

  2. Apply ERA5 → ELM unit conversions

  3. Compute humidities (if columns available)

  4. Rename columns to canonical ELM names using RAW_TO_ELM

  5. Clip canonical nonnegative variables

  6. Return only the canonical vars required by elm_required_vars(dformat), plus LONGXY/LATIXY/time/gid/zone (coords/meta).

required_vars(dformat)[source]

Return the canonical ELM variables required for the requested output format.

dapper.met.adapters.fluxnet module

dapper module: met.adapters.fluxnet.

class dapper.met.adapters.fluxnet.FluxnetAdapter[source]

Bases: BaseAdapter

AmeriFlux FLUXNET → ELM adapter.

Assumptions

  • User provides a single FLUXNET CSV (FULLSET or SUBSET) per run.

  • CSV contains TIMESTAMP_START/TIMESTAMP_END (or TIMESTAMP) columns.

  • Missing values are coded as -9999.

  • Exporter supplies df_merged with [‘gid’,’lat’,’lon’,’zone’, …] already merged in from df_loc.

DRIVER_TAG = 'FLUXNET'
SOURCE_NAME = 'FLUXNET (AmeriFlux ONEFlux) tower data'
discover_files(csv_directory, calendar)[source]

Return (csv_files, start_year, end_year).

Parameters:

calendar (str)

preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]

Return a DataFrame with at least: [‘gid’,’time’,’LATIXY’,’LONGXY’,’zone’, <ELM vars>]

Return type:

DataFrame

Parameters:
  • df_merged (DataFrame)

  • start_year (int)

  • end_year (int)

  • calendar (str)

  • dformat (str)

required_vars(dformat)[source]

Optional: return a list of ELM var short names that this adapter will produce for the given dformat (‘BYPASS’ or ‘DATM_MODE’). Exporter doesn’t require it.

Parameters:

dformat (str)

dapper.met.adapters.fluxnet.infer_fluxnet_dt_hours(df)[source]

Infer native FLUXNET timestep from timestamp columns in hours.

Handles:
  • half-hourly/hourly/weekly: TIMESTAMP_START, TIMESTAMP_END (YYYYMMDDHHMM)

  • daily/monthly/yearly: TIMESTAMP (YYYYMMDD or YYYYMM, etc.)

Returns:

Approximate timestep in hours.

Return type:

float

Raises:

ValueError – If no suitable timestamp columns are found or dt cannot be inferred.

Parameters:

df (DataFrame)

Module contents

MET adapters.

This package contains dataset-specific adapters (ERA5, Fluxnet, …).

The source-agnostic exporter lives in dapper.met.exporter, but we re-export it here for convenience/back-compat.

class dapper.met.adapters.BaseAdapter[source]

Bases: ABC

Adapter contract for met sources. The Exporter depends only on this interface. Implementations may override defaults as needed.

abstractmethod discover_files(csv_directory, calendar)[source]

Return (csv_files, start_year, end_year).

Parameters:

calendar (str)

normalize_locations(df_loc, id_col=None)[source]

Standardize df_loc to include [‘gid’,’lat’,’lon’,’lon_0-360’,’zone’], sorted by (lat, lon).

Zones are treated as a per-location grouping label (e.g., to support E3SM/ELM decomposition hints across a cellset). If df_loc lacks a ‘zone’ column, we default every location to zone=1.

We intentionally do not auto-expand locations across multiple zones. If you want multiple zones, supply an explicit ‘zone’ column in df_loc with one row per location.

Return type:

DataFrame

Parameters:

df_loc (DataFrame)

pack_params(elm_var, data=None)[source]

Default: use elm_utils.elm_var_packing_params if available; otherwise a safe fallback. Adapters can override for source-specific packing strategies.

Parameters:

elm_var (str)

abstractmethod preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]

Return a DataFrame with at least: [‘gid’,’time’,’LATIXY’,’LONGXY’,’zone’, <ELM vars>]

Return type:

DataFrame

Parameters:
  • df_merged (DataFrame)

  • start_year (int)

  • end_year (int)

  • calendar (str)

  • dformat (str)

required_vars(dformat)[source]

Optional: return a list of ELM var short names that this adapter will produce for the given dformat (‘BYPASS’ or ‘DATM_MODE’). Exporter doesn’t require it.

Parameters:

dformat (str)

class dapper.met.adapters.ERA5Adapter[source]

Bases: BaseAdapter

ERA5-Land → ELM adapter.

This adapter implements the BaseAdapter interface for ERA5-Land hourly data. It handles source-specific details—file discovery, unit conversions, humidity diagnostics, renaming to ELM short names, and nonnegativity enforcement, so the upstream Exporter can remain source-agnostic.

Responsibilities

  • discover_files: Find CSV shards in a directory and infer the overall (start_year, end_year) using their date coverage.

  • normalize_locations: Validate and normalize the locations table (adds lon_0-360, ensures/creates zone, stable sorting).

  • id_column_for_csv: Declare the identifier column name in the input CSVs. For ERA5 we require gid.

  • preprocess_shard: Convert one merged shard (CSV rows joined to locations) into canonical ELM columns. Steps include:

    1. time filtering and optional “noleap” removal of Feb 29

    2. ERA5→ELM unit conversions (e.g., J/hr/m² → W/m², m/hr → mm/s)

    3. optional humidity computation (RH/Q) if temperature, dewpoint, and surface pressure are available

    4. renaming raw ERA5 fields to ELM short names via a mapping

    5. clipping canonical nonnegative variables

    6. returning only required columns in a deterministic order

  • required_vars: Report the canonical ELM variable names required for the requested output format.

  • pack_params: Provide robust (add_offset, scale_factor) for a canonical ELM variable, given optional data to tune ranges.

Notes

  • Humidity computation is performed only when temperature_2m, dewpoint_temperature_2m, and surface_pressure are present.

  • Precipitation conversion uses m/hr mm/s via division by 3.6.

DRIVER_TAG = 'ERA5'
SOURCE_NAME = 'ERA5-Land hourly reanalysis'
discover_files(csv_directory, calendar)[source]

Discover ERA5 CSV shards in a directory and infer the inclusive year range.

id_column_for_csv(df_csv, id_col)[source]

Return the required identifier column name expected in ERA5 CSV shards (“gid”).

pack_params(elm_var, data=None)[source]

Return (add_offset, scale_factor) used to pack a variable for NetCDF output.

preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]
  1. Filter time & handle no-leap

  2. Apply ERA5 → ELM unit conversions

  3. Compute humidities (if columns available)

  4. Rename columns to canonical ELM names using RAW_TO_ELM

  5. Clip canonical nonnegative variables

  6. Return only the canonical vars required by elm_required_vars(dformat), plus LONGXY/LATIXY/time/gid/zone (coords/meta).

required_vars(dformat)[source]

Return the canonical ELM variables required for the requested output format.

class dapper.met.adapters.Exporter(adapter, src_path, *, domain, out_dir=None, calendar='noleap', dtime_resolution_hrs=1, dtime_units='days', dformat='BYPASS', append_attrs=None, chunks=None, include_vars=None, exclude_vars=None)[source]

Bases: object

Source-agnostic meteorological exporter.

This class orchestrates a two-pass pipeline that ingests time-sharded CSVs for many sites/cells, preprocesses them via a pluggable adapter, and writes ELM-ready NetCDF outputs in two layouts:

  1. "cellset" – one NetCDF per variable with dims ('DTIME','lat','lon') (global packing; sparse lat/lon axes are OK).

  2. "sites" – one directory per site; each directory contains one NetCDF per variable with dims ('n','DTIME') where n=1 (per-site packing).

Exporter is source-agnostic: all dataset-specific logic (file discovery, unit conversions, renaming to ELM short names, etc.) lives in an adapter that implements the BaseAdapter interface (e.g., an ERA5Adapter). The exporter handles staging (CSV → per-site parquet), global DTIME axis creation, packing scans, chunking, and NetCDF I/O.

Parameters:
  • adapter (BaseAdapter) – Implements: discover_files, normalize_locations, preprocess_shard, required_vars, and pack_params.

  • csv_directory (str or pathlib.Path) – Directory containing time-sharded CSV files for all sites/cells.

  • out_dir (str or pathlib.Path) – Destination directory for NetCDF outputs and temporary parquet shards.

  • df_loc (pandas.DataFrame) – Locations table with at least columns ["gid","lat","lon"]; optional "zone". The adapter’s normalize_locations: - validates columns, - adds "lon_0-360", - fills/validates "zone", - sorts for stable site order.

  • id_col (str, optional) – Kept for backward compatibility (unused when "gid" is assumed).

  • calendar ({"noleap","standard"}, default "noleap") – Calendar for numeric DTIME coordinate; Feb 29 filtered for “noleap”.

  • dtime_resolution_hrs (int, default 1) – Target time resolution in hours for the DTIME axis.

  • dtime_units ({"days","hours"}, default "days") – Units of the numeric DTIME coordinate (e.g., "days since YYYY-MM-DD HH:MM:SS").

  • domain (Domain)

  • dformat (str)

  • append_attrs (dict | None)

dformat{“BYPASS”,”DATM_MODE”}, default “BYPASS”

Target ELM format selector passed through to the adapter.

append_attrsdict, optional

Extra global NetCDF attributes to include in every file. The exporter also adds: export_mode ("cellset" or "sites") and pack_scope ("global" or "per-site").

chunkstuple[int,…], optional

Explicit NetCDF chunk sizes.

include_vars / exclude_varsIterable[str], optional

Allow-/block-lists of ELM short names applied after preprocess. Meta columns {"gid","time","LATIXY","LONGXY","zone"} are always kept.

Side Effects

  • Creates a temporary directory of per-site parquet shards under out_dir.

  • Writes NetCDF files to out_dir in the chosen layout.

  • Writes a zone_mappings.txt file either at the root (cellset) or inside each site directory (sites).

Notes

  • Packing: global packing for cellset; per-site packing for sites.

  • Required columns: CSV shards and df_loc both use "gid"; CSVs include the adapter’s date/time column (renamed to "time" during preprocess).

  • Combined (lat/lon) layout: does not enforce regular grids; axes are the unique sorted lat/lon from df_loc (sparse OK).

run(*, pack_scope=None, filename=None, overwrite=False)[source]

Run the MET export for this exporter’s Domain.

The output layout is derived from Domain.mode:
  • sites: writes <run_dir>/<gid>/MET/{prefix_}{var}.nc and a per-site zone_mappings.txt (always zone=01, id=1).

  • cellset: writes <run_dir>/MET/{prefix_}{var}.nc and a single zone_mappings.txt covering all locations (zones taken from df_loc, default 1).

Parameters:
  • pack_scope – Optional packing strategy override. Defaults to per-site for sites and global for cellset outputs.

  • filename (str | None) – Optional filename prefix for output NetCDF files. If provided, each variable is written to {filename}_{var}.nc.

  • overwrite (bool) – If True, clears existing MET outputs before writing.

Return type:

None

class dapper.met.adapters.FluxnetAdapter[source]

Bases: BaseAdapter

AmeriFlux FLUXNET → ELM adapter.

Assumptions

  • User provides a single FLUXNET CSV (FULLSET or SUBSET) per run.

  • CSV contains TIMESTAMP_START/TIMESTAMP_END (or TIMESTAMP) columns.

  • Missing values are coded as -9999.

  • Exporter supplies df_merged with [‘gid’,’lat’,’lon’,’zone’, …] already merged in from df_loc.

DRIVER_TAG = 'FLUXNET'
SOURCE_NAME = 'FLUXNET (AmeriFlux ONEFlux) tower data'
discover_files(csv_directory, calendar)[source]

Return (csv_files, start_year, end_year).

Parameters:

calendar (str)

native_dt_hours: Optional[float]
preprocess_shard(df_merged, start_year, end_year, calendar, dformat)[source]

Return a DataFrame with at least: [‘gid’,’time’,’LATIXY’,’LONGXY’,’zone’, <ELM vars>]

Return type:

DataFrame

Parameters:
  • df_merged (DataFrame)

  • start_year (int)

  • end_year (int)

  • calendar (str)

  • dformat (str)

required_vars(dformat)[source]

Optional: return a list of ELM var short names that this adapter will produce for the given dformat (‘BYPASS’ or ‘DATM_MODE’). Exporter doesn’t require it.

Parameters:

dformat (str)

resolution: Optional[str]