dapper.sample_e5lh

dapper.sample_e5lh(params, domain_name=None, skip_tasks=False)[source]

Submit Google Earth Engine (GEE) export tasks for ERA5-Land Hourly time series.

This prepares the ERA5-Land Hourly ImageCollection ("ECMWF/ERA5_LAND/HOURLY"), validates bands, ensures each geometry samples at least one pixel center (falling back to points when needed), batches the requested date range into N-year chunks, and (unless skip_tasks=True) starts one Drive export task per batch.

Parameters:
  • params (dict) –

    Configuration dictionary. Expected keys (case-sensitive):

    • start_date (str): Start date in "YYYY-MM-DD".

    • end_date (str): End date in "YYYY-MM-DD".

    • geometries: One of the following:

      • str: GEE asset ID for a FeatureCollection (e.g., "users/me/my_fc").

      • ee.FeatureCollection: a pre-constructed collection.

      • GeoDataFrame: must contain geometry and an ID column (see geometry_id_field).

      • AOI: dapper.domains.aoi.AOI instance; uses its internal GeoDataFrame.

      • Domain: dapper.domains.domain.Domain instance; uses Domain.to_geometries().

    • geometry_id_field (str, optional): ID column in provided geometries. Defaults to "gid". Values are copied into the "gid" property on each feature.

    • gee_bands (str or list[str]): Which ERA5-Land bands to export. One of:

      • "all": all available bands (from era5.ALL_BANDS)

      • "elm": bands required to derive ELM variables (from era5.REQUIRED_RAW_BANDS)

      • a list of band names validated against the collection

    • gdrive_folder (str): Google Drive folder name where CSV chunks are written.

    • job_name (str): Base name used to build per-batch export descriptions/filenames.

    • gee_scale (str or int or float): Sampling scale in meters. If "native" (or a value < 11132), the native ERA5-Land scale of 11132 m is used.

    • gee_years_per_task (int, optional): Years per export batch (default: 5).

    The function sets params["gee_ic"] = "ECMWF/ERA5_LAND/HOURLY" internally.

  • domain_name (str, optional) – Optional name for the returned Domain.

  • skip_tasks (bool, default False) – If True, do everything except starting the GEE export tasks.

Returns:

Domain describing the sampling locations. The underlying GeoDataFrame contains at least "gid", "lon", and "lat".

Return type:

Domain

Notes

  • Call ee.Initialize() before using this function.

  • CSV selectors include ["gid", "date"] + params["gee_bands"].

  • Dates are derived from system:time_start and formatted in UTC.

Raises:
  • KeyError – If required keys are missing from params.

  • ValueError – If dates are malformed or geometries is an unsupported type.

  • TypeError – If gee_scale is not "native" and not numeric.

  • ee.EEException – Propagated Earth Engine errors (e.g., authentication, export quota).

Examples

params = {
    "start_date": "1950-01-01",
    "end_date": "1951-12-31",
    "geometries": "users/me/my_sites_fc",
    "geometry_id_field": "gid",
    "gee_bands": "elm",
    "gee_scale": "native",
    "gee_years_per_task": 5,
    "gdrive_folder": "era5_exports",
    "job_name": "era5l_sites",
}
domain = sample_e5lh(params)
domain.gdf.head()