dapper.integrations.earthengine package¶
Submodules¶
dapper.integrations.earthengine.gee_utils module¶
Google Earth Engine helpers and sampling utilities.
- dapper.integrations.earthengine.gee_utils.determine_gee_batches(start_date, end_date, max_date, years_per_task=5, verbose=True)[source]¶
Calculates how to batch tasks for splitting bigger GEE jobs. Currently assumes ERA5-Land hourly (i.e. hourly data with a known date range).
Returns a DataFrame where each row defines the start and end time for each Task in a batch.
- dapper.integrations.earthengine.gee_utils.ensure_pixel_centers_within_geometries(fc, sample_img, scale)[source]¶
Ensures each feature in fc will sample valid data from sample_img at the given scale. For polygons/multipolygons with zero pixel centers inside, replaces geometry with its centroid. Properties are preserved.
- dapper.integrations.earthengine.gee_utils.export_fc(fc, filename, fileformat, folder='dapper_exports', prefix=None, verbose=False)[source]¶
Export a FeatureCollection to Google Drive using Earth Engine’s table export.
Parameters: - fc: ee.FeatureCollection
The feature collection to export.
- filename: str
The export task description and also used as the file name (if prefix is not provided).
- fileformat: str
- File format for the export. Must be one of:
‘CSV’
‘GeoJSON’
‘KML’
‘KMZ’
- folder: str, optional
Google Drive folder to export to. Defaults to ‘dapper_exports’.
- prefix: str, optional
File name prefix for the exported file. Defaults to the filename if not provided.
- verbose: bool, optional
If True, prints export destination information.
Returns: - None
- dapper.integrations.earthengine.gee_utils.featurecollection_to_df_loc(fc, name='gee')[source]¶
Legacy wrapper: convert a FeatureCollection to a df_loc-style GeoDataFrame.
Prefer featurecollection_to_domain(fc).cells in new code.
- dapper.integrations.earthengine.gee_utils.featurecollection_to_domain(fc, name='gee', domain_nc=None, *, mode='sites')[source]¶
Converts an ee.FeatureCollection object to a Domain with one row per feature and representative lon/lat.
Points → lon/lat from the point.
Polygons → lon/lat from a representative interior point.
MultiPolygons → same as polygons.
- The Domain’s underlying GeoDataFrame has:
‘gid’ : copied from feature properties
‘lon’ : representative longitude (EPSG:4326)
‘lat’ : representative latitude (EPSG:4326)
‘method’: how sampling was interpreted
‘sampled_geometry’: WKT of the original geometry
‘geometry’: Point at (lon, lat)
- Parameters:
mode (str)
- dapper.integrations.earthengine.gee_utils.infer_id_field(columns, verbose=False)[source]¶
Tries to discern the id field from a list of columns. Used when id_col is not specified.
- dapper.integrations.earthengine.gee_utils.kill_all_tasks(verbose=True)[source]¶
Cancel all Earth Engine tasks visible to the current account.
- dapper.integrations.earthengine.gee_utils.masks_to_featurecollection(mask_entries, region, export_scale, extra_image_props=None)[source]¶
mask_entries: list of {‘band_name’,’mask’,’meta’} Returns ee.FeatureCollection with metadata as properties. One feature per band (union of all polygons for that band).
- dapper.integrations.earthengine.gee_utils.parse_geometry_object(geom, name=None)[source]¶
Convert a single geometry-like input into an ee.Geometry.
- Supported:
str: treated as a GEE asset id; returns its union geometry
shapely: Point / Polygon / MultiPolygon / LineString / MultiLineString / GeometryCollection
ee.Geometry / ee.Feature / ee.FeatureCollection
- dapper.integrations.earthengine.gee_utils.parse_geometry_objects(geom, geometry_id_field=None)[source]¶
Translate geometry containers to an ee.FeatureCollection.
- Accepted inputs:
Domain: uses domain.support (preferred) or legacy domain.gdf
str: interpreted as a GEE asset id
ee.FeatureCollection: returned (re-cast) as FeatureCollection
GeoDataFrame: requires geometry_id_field; converts rows to features
Returns an ee.FeatureCollection (even if a single feature is present).
Notes
This function intentionally does NOT depend on AOI.
This function does NOT attempt to “fix” individual shapely geometries (e.g., MultiPolygon). GeoDataFrame -> GeoJSON -> EE handles that.
- dapper.integrations.earthengine.gee_utils.sample_e5lh(params, domain_name=None, skip_tasks=False)[source]¶
Submit Google Earth Engine (GEE) export tasks for ERA5-Land Hourly time series.
This prepares the ERA5-Land Hourly ImageCollection (
"ECMWF/ERA5_LAND/HOURLY"), validates bands, ensures each geometry samples at least one pixel center (falling back to points when needed), batches the requested date range into N-year chunks, and (unlessskip_tasks=True) starts one Drive export task per batch.- Parameters:
params (dict) –
Configuration dictionary. Expected keys (case-sensitive):
start_date (str): Start date in
"YYYY-MM-DD".end_date (str): End date in
"YYYY-MM-DD".geometries: One of the following:
str: GEE asset ID for a FeatureCollection (e.g.,
"users/me/my_fc").ee.FeatureCollection: a pre-constructed collection.
GeoDataFrame: must contain geometry and an ID column (see
geometry_id_field).AOI:
dapper.domains.aoi.AOIinstance; uses its internal GeoDataFrame.Domain:
dapper.domains.domain.Domaininstance; usesDomain.to_geometries().
geometry_id_field (str, optional): ID column in provided geometries. Defaults to
"gid". Values are copied into the"gid"property on each feature.gee_bands (str or list[str]): Which ERA5-Land bands to export. One of:
"all": all available bands (fromera5.ALL_BANDS)"elm": bands required to derive ELM variables (fromera5.REQUIRED_RAW_BANDS)a list of band names validated against the collection
gdrive_folder (str): Google Drive folder name where CSV chunks are written.
job_name (str): Base name used to build per-batch export descriptions/filenames.
gee_scale (str or int or float): Sampling scale in meters. If
"native"(or a value < 11132), the native ERA5-Land scale of 11132 m is used.gee_years_per_task (int, optional): Years per export batch (default:
5).
The function sets
params["gee_ic"] = "ECMWF/ERA5_LAND/HOURLY"internally.domain_name (str, optional) – Optional name for the returned Domain.
skip_tasks (bool, default False) – If True, do everything except starting the GEE export tasks.
- Returns:
Domain describing the sampling locations. The underlying GeoDataFrame contains at least
"gid","lon", and"lat".- Return type:
Notes
Call
ee.Initialize()before using this function.CSV selectors include
["gid", "date"] + params["gee_bands"].Dates are derived from
system:time_startand formatted in UTC.
- Raises:
KeyError – If required keys are missing from
params.ValueError – If dates are malformed or
geometriesis an unsupported type.TypeError – If
gee_scaleis not"native"and not numeric.ee.EEException – Propagated Earth Engine errors (e.g., authentication, export quota).
Examples
params = { "start_date": "1950-01-01", "end_date": "1951-12-31", "geometries": "users/me/my_sites_fc", "geometry_id_field": "gid", "gee_bands": "elm", "gee_scale": "native", "gee_years_per_task": 5, "gdrive_folder": "era5_exports", "job_name": "era5l_sites", } domain = sample_e5lh(params) domain.gdf.head()
- dapper.integrations.earthengine.gee_utils.sample_image_over_polygons(gdf, image, geometry_id_field, band=None, reducer='mean', out_name=None, scale=None, ensure_pixel_centers=True, verbose=True)[source]¶
Sample a single-band ee.Image over polygons in a GeoDataFrame.
- Parameters:
gdf (geopandas.GeoDataFrame) – Input geometries; must have a geometry column in EPSG:4326.
image (ee.Image) – Image to sample. Must be single-band unless ‘band’ is specified.
geometry_id_field (str) – Column in gdf containing unique IDs for each geometry.
band (str, optional) – Band name to sample. If None, image must have exactly one band.
reducer (str or ee.Reducer, default 'mean') – Spatial aggregator. Strings: ‘mean’, ‘min’, ‘max’, ‘std’.
out_name (str, optional) – Name of the output column. If None, uses ‘<band>_<reducer>’.
scale (float, optional) – Pixel scale in meters. If None, uses the image’s nominal scale.
ensure_pixel_centers (bool, default True) – If True, tiny polygons with no pixel centers inside will be sampled at their centroid instead (see ensure_pixel_centers_within_geometries).
verbose (bool, default True) – Print basic status.
- Returns:
Copy of gdf with a new column ‘out_name’ added.
- Return type:
geopandas.GeoDataFrame
- dapper.integrations.earthengine.gee_utils.split_into_dfs(path_csv)[source]¶
Splits a GEE-exported csv (from sample_e5lh_at_points) into a dictionary of dataframes based on the unique values in the ‘pid’ column.
Module contents¶
dapper module: integrations.earthengine.__init__.