dapper.integrations.earthengine package

Submodules

dapper.integrations.earthengine.gee_utils module

Google Earth Engine helpers and sampling utilities.

dapper.integrations.earthengine.gee_utils.determine_gee_batches(start_date, end_date, max_date, years_per_task=5, verbose=True)[source]

Calculates how to batch tasks for splitting bigger GEE jobs. Currently assumes ERA5-Land hourly (i.e. hourly data with a known date range).

Returns a DataFrame where each row defines the start and end time for each Task in a batch.

dapper.integrations.earthengine.gee_utils.ensure_pixel_centers_within_geometries(fc, sample_img, scale)[source]

Ensures each feature in fc will sample valid data from sample_img at the given scale. For polygons/multipolygons with zero pixel centers inside, replaces geometry with its centroid. Properties are preserved.

dapper.integrations.earthengine.gee_utils.export_fc(fc, filename, fileformat, folder='dapper_exports', prefix=None, verbose=False)[source]

Export a FeatureCollection to Google Drive using Earth Engine’s table export.

Parameters: - fc: ee.FeatureCollection

The feature collection to export.

  • filename: str

    The export task description and also used as the file name (if prefix is not provided).

  • fileformat: str
    File format for the export. Must be one of:
    • ‘CSV’

    • ‘GeoJSON’

    • ‘KML’

    • ‘KMZ’

  • folder: str, optional

    Google Drive folder to export to. Defaults to ‘dapper_exports’.

  • prefix: str, optional

    File name prefix for the exported file. Defaults to the filename if not provided.

  • verbose: bool, optional

    If True, prints export destination information.

Returns: - None

dapper.integrations.earthengine.gee_utils.featurecollection_to_df_loc(fc, name='gee')[source]

Legacy wrapper: convert a FeatureCollection to a df_loc-style GeoDataFrame.

Prefer featurecollection_to_domain(fc).cells in new code.

dapper.integrations.earthengine.gee_utils.featurecollection_to_domain(fc, name='gee', domain_nc=None, *, mode='sites')[source]

Converts an ee.FeatureCollection object to a Domain with one row per feature and representative lon/lat.

  • Points → lon/lat from the point.

  • Polygons → lon/lat from a representative interior point.

  • MultiPolygons → same as polygons.

The Domain’s underlying GeoDataFrame has:
  • ‘gid’ : copied from feature properties

  • ‘lon’ : representative longitude (EPSG:4326)

  • ‘lat’ : representative latitude (EPSG:4326)

  • ‘method’: how sampling was interpreted

  • ‘sampled_geometry’: WKT of the original geometry

  • ‘geometry’: Point at (lon, lat)

Parameters:

mode (str)

dapper.integrations.earthengine.gee_utils.infer_id_field(columns, verbose=False)[source]

Tries to discern the id field from a list of columns. Used when id_col is not specified.

dapper.integrations.earthengine.gee_utils.kill_all_tasks(verbose=True)[source]

Cancel all Earth Engine tasks visible to the current account.

dapper.integrations.earthengine.gee_utils.masks_to_featurecollection(mask_entries, region, export_scale, extra_image_props=None)[source]

mask_entries: list of {‘band_name’,’mask’,’meta’} Returns ee.FeatureCollection with metadata as properties. One feature per band (union of all polygons for that band).

dapper.integrations.earthengine.gee_utils.parse_geometry_object(geom, name=None)[source]

Convert a single geometry-like input into an ee.Geometry.

Supported:
  • str: treated as a GEE asset id; returns its union geometry

  • shapely: Point / Polygon / MultiPolygon / LineString / MultiLineString / GeometryCollection

  • ee.Geometry / ee.Feature / ee.FeatureCollection

dapper.integrations.earthengine.gee_utils.parse_geometry_objects(geom, geometry_id_field=None)[source]

Translate geometry containers to an ee.FeatureCollection.

Accepted inputs:
  • Domain: uses domain.support (preferred) or legacy domain.gdf

  • str: interpreted as a GEE asset id

  • ee.FeatureCollection: returned (re-cast) as FeatureCollection

  • GeoDataFrame: requires geometry_id_field; converts rows to features

Returns an ee.FeatureCollection (even if a single feature is present).

Notes

  • This function intentionally does NOT depend on AOI.

  • This function does NOT attempt to “fix” individual shapely geometries (e.g., MultiPolygon). GeoDataFrame -> GeoJSON -> EE handles that.

dapper.integrations.earthengine.gee_utils.sample_e5lh(params, domain_name=None, skip_tasks=False)[source]

Submit Google Earth Engine (GEE) export tasks for ERA5-Land Hourly time series.

This prepares the ERA5-Land Hourly ImageCollection ("ECMWF/ERA5_LAND/HOURLY"), validates bands, ensures each geometry samples at least one pixel center (falling back to points when needed), batches the requested date range into N-year chunks, and (unless skip_tasks=True) starts one Drive export task per batch.

Parameters:
  • params (dict) –

    Configuration dictionary. Expected keys (case-sensitive):

    • start_date (str): Start date in "YYYY-MM-DD".

    • end_date (str): End date in "YYYY-MM-DD".

    • geometries: One of the following:

      • str: GEE asset ID for a FeatureCollection (e.g., "users/me/my_fc").

      • ee.FeatureCollection: a pre-constructed collection.

      • GeoDataFrame: must contain geometry and an ID column (see geometry_id_field).

      • AOI: dapper.domains.aoi.AOI instance; uses its internal GeoDataFrame.

      • Domain: dapper.domains.domain.Domain instance; uses Domain.to_geometries().

    • geometry_id_field (str, optional): ID column in provided geometries. Defaults to "gid". Values are copied into the "gid" property on each feature.

    • gee_bands (str or list[str]): Which ERA5-Land bands to export. One of:

      • "all": all available bands (from era5.ALL_BANDS)

      • "elm": bands required to derive ELM variables (from era5.REQUIRED_RAW_BANDS)

      • a list of band names validated against the collection

    • gdrive_folder (str): Google Drive folder name where CSV chunks are written.

    • job_name (str): Base name used to build per-batch export descriptions/filenames.

    • gee_scale (str or int or float): Sampling scale in meters. If "native" (or a value < 11132), the native ERA5-Land scale of 11132 m is used.

    • gee_years_per_task (int, optional): Years per export batch (default: 5).

    The function sets params["gee_ic"] = "ECMWF/ERA5_LAND/HOURLY" internally.

  • domain_name (str, optional) – Optional name for the returned Domain.

  • skip_tasks (bool, default False) – If True, do everything except starting the GEE export tasks.

Returns:

Domain describing the sampling locations. The underlying GeoDataFrame contains at least "gid", "lon", and "lat".

Return type:

Domain

Notes

  • Call ee.Initialize() before using this function.

  • CSV selectors include ["gid", "date"] + params["gee_bands"].

  • Dates are derived from system:time_start and formatted in UTC.

Raises:
  • KeyError – If required keys are missing from params.

  • ValueError – If dates are malformed or geometries is an unsupported type.

  • TypeError – If gee_scale is not "native" and not numeric.

  • ee.EEException – Propagated Earth Engine errors (e.g., authentication, export quota).

Examples

params = {
    "start_date": "1950-01-01",
    "end_date": "1951-12-31",
    "geometries": "users/me/my_sites_fc",
    "geometry_id_field": "gid",
    "gee_bands": "elm",
    "gee_scale": "native",
    "gee_years_per_task": 5,
    "gdrive_folder": "era5_exports",
    "job_name": "era5l_sites",
}
domain = sample_e5lh(params)
domain.gdf.head()
dapper.integrations.earthengine.gee_utils.sample_image_over_polygons(gdf, image, geometry_id_field, band=None, reducer='mean', out_name=None, scale=None, ensure_pixel_centers=True, verbose=True)[source]

Sample a single-band ee.Image over polygons in a GeoDataFrame.

Parameters:
  • gdf (geopandas.GeoDataFrame) – Input geometries; must have a geometry column in EPSG:4326.

  • image (ee.Image) – Image to sample. Must be single-band unless ‘band’ is specified.

  • geometry_id_field (str) – Column in gdf containing unique IDs for each geometry.

  • band (str, optional) – Band name to sample. If None, image must have exactly one band.

  • reducer (str or ee.Reducer, default 'mean') – Spatial aggregator. Strings: ‘mean’, ‘min’, ‘max’, ‘std’.

  • out_name (str, optional) – Name of the output column. If None, uses ‘<band>_<reducer>’.

  • scale (float, optional) – Pixel scale in meters. If None, uses the image’s nominal scale.

  • ensure_pixel_centers (bool, default True) – If True, tiny polygons with no pixel centers inside will be sampled at their centroid instead (see ensure_pixel_centers_within_geometries).

  • verbose (bool, default True) – Print basic status.

Returns:

Copy of gdf with a new column ‘out_name’ added.

Return type:

geopandas.GeoDataFrame

dapper.integrations.earthengine.gee_utils.split_into_dfs(path_csv)[source]

Splits a GEE-exported csv (from sample_e5lh_at_points) into a dictionary of dataframes based on the unique values in the ‘pid’ column.

dapper.integrations.earthengine.gee_utils.try_to_download_featurecollection(fc, verbose=True)[source]

Attempt to load FeatureCollection as a GeoDataFrame; else return None.

dapper.integrations.earthengine.gee_utils.validate_bands(bandlist, gee_ic)[source]

Ensures that the requested bands are available and errors if not.

Module contents

dapper module: integrations.earthengine.__init__.