Species Occurrence Cube SQL Functions
These custom functions have been developed by the B-Cubed project for producing species occurrence cubes.
See Generate a species occurrence cube using Microsoft Azure Databricks for an example of their usage.
Grid functions
These functions are usually used for generating species occurrence cubes. They calculate a grid cell for a point.
EEA Reference Grid cell code
Suggested function declaration:
CREATE OR REPLACE TEMPORARY FUNCTION eeaCellCode AS 'org.gbif.occurrence.hive.udf.EeaCellCodeUDF';
Function definition:
STRING eeaCellCode(INTEGER gridSize,
DOUBLE latitude,
DOUBLE longitude,
DOUBLE coordinateUncertaintyInMeters)
Parameters:
gridSize-
The grid size in metres of the EEA Reference Grid. Must be 25, 100, 250, 1000, 10000 or 100000.
latitude-
The latitude of the point to grid (usually the
decimalLatitudefield) longitude-
The longitude of the point to grid (usually the
decimalLongitudefield) coordinateUncertaintyInMeters-
The uncertainty radius to apply to the point (usually used with a default value, for example
COALESCE(coordinateUncertaintyInMeters, 1000))
The coordinateUncertaintyInMeters value is used to randomize the point within the circle.
Return value:
The code for the cell of the EEA reference grid in which the randomized point falls.
Example arguments and returned values:
Result |
gridSize |
latitude |
longitude |
coordinateUncertaintyInMeters |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GeoPackage and ShapeFile downloads of the grids at several resolutions are available for download at the European Environment Agency Datahub. (See the 2013 and 2011 datasets for alternative resolutions.)
Reference: About the EEA reference grid (PDF).
Extended Quarter-Degree Grid cell code
Suggested function declaration:
CREATE OR REPLACE TEMPORARY FUNCTION eqdgcCode AS 'org.gbif.occurrence.hive.udf.ExtendedQuarterDegreeGridCellCodeUDF';
Function definition:
STRING eqdgcCode(INTEGER level,
DOUBLE latitude,
DOUBLE longitude,
DOUBLE coordinateUncertaintyInMeters)
Parameters:
level-
The level of the grid; the number of additional divisions applied to a one-degree cell.
latitude-
The latitude of the point to grid (usually the
decimalLatitudefield) longitude-
The longitude of the point to grid (usually the
decimalLongitudefield) coordinateUncertaintyInMeters-
The uncertainty radius to apply to the point (usually used with a default value, for example
COALESCE(coordinateUncertaintyInMeters, 1000))
The coordinateUncertaintyInMeters value is used to randomize the point within the circle.
A level of 0 will give the 1° cell, e.g. S01E010. For quarter-degrees, use level 2, e.g. S01E010AD.
Return value:
The code for the cell of the Extended Quarter-Degree Grid in which the randomized point falls.
Example arguments and returned values:
Result |
level |
latitude |
longitude |
coordinateUncertaintyInMeters |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
See also: Wikipedia: QDGC.
ISEA3H Grid cell code
This function implements the Inverse Snyder Equal-Area Projection (ISEA) Aperture 3 Hexagonal (3H) Discrete Global Grid System (DGGS), ISEA3H. Grid cell codes/identifiers are as specified in A novel identifier scheme for the ISEA Aperture 3 Hexagon Discrete Global Grid System.
Suggested function declaration:
CREATE OR REPLACE TEMPORARY FUNCTION isea3hCode AS 'org.gbif.occurrence.hive.udf.Isea3hCellCodeUDF';
Function definition:
STRING isea3hCode(INTEGER resolution,
DOUBLE latitude,
DOUBLE longitude,
DOUBLE coordinateUncertaintyInMeters)
Parameters:
resolution-
The resolution of the grid; the number of division steps applied to the initial dodecahedron. Valid values are 1 to 22.
latitude-
The latitude of the point to grid (usually the
decimalLatitudefield) longitude-
The longitude of the point to grid (usually the
decimalLongitudefield) coordinateUncertaintyInMeters-
The uncertainty radius to apply to the point (usually used with a default value, for example
COALESCE(coordinateUncertaintyInMeters, 1000))
The coordinateUncertaintyInMeters value is used to randomize the point within the circle.
See ISEA3H details for the number and size of cells at each resolution.
Return value:
The code for the cell of the ISEA3H grid cell in which the randomized point falls.
Example arguments and returned values:
Result |
resolution |
latitude |
longitude |
coordinateUncertaintyInMeters |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References:
See also:
Military Grid Reference System cell code
Suggested function declaration:
CREATE OR REPLACE TEMPORARY FUNCTION mgrsCode AS 'org.gbif.occurrence.hive.udf.MilitaryGridReferenceSystemCellCodeUDF';
Function definition:
STRING mgrsCode(INTEGER gridSize,
DOUBLE latitude,
DOUBLE longitude,
DOUBLE coordinateUncertaintyInMeters)
Parameters:
gridSize-
The grid size in metres. Must be 1, 10, 100, 1000, 10000, 100000 or 0.
latitude-
The latitude of the point to grid (usually the
decimalLatitudefield) longitude-
The longitude of the point to grid (usually the
decimalLongitudefield) coordinateUncertaintyInMeters-
The uncertainty radius to apply to the point (usually used with a default value, for example
COALESCE(coordinateUncertaintyInMeters, 1000))
The coordinateUncertaintyInMeters value is used to randomize the point within the circle.
A gridSize of 0 will give the Grid Zone Junction (GZJ) only, e.g. 32. Other values increase the accuracy of the grid, e.g. 100 (metres) 32UNC686615.
Return value:
The code for the cell of the Military Grid Reference System in which the randomized point falls.
Example arguments and returned values:
Result |
gridSize |
latitude |
longitude |
coordinateUncertaintyInMeters |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reference: Grids and Reference Systems.
See also: Wikipedia: Military Grid Reference System.
Temporal Uncertainty
Suggested function declaration:
CREATE OR REPLACE TEMPORARY FUNCTION temporalUncertainty AS 'org.gbif.occurrence.hive.udf.TemporalUncertaintyUDF';
Function definition:
STRING temporalUncertainty(String dateInterval)
Parameters:
dateInterval-
A string containing an ISO-8106 date or date interval.
Return value:
This function returns the uncertainty in seconds of the date.
Example arguments and returned value
dateInterval |
Result |
Remarks |
2021-03-21T15:01:32.456Z |
1 |
Milliseconds are rounded to seconds. |
2021-03-21T15:01:32Z |
1 |
|
2021-03-21T15:01Z |
60 |
|
2021-03-21T15Z |
60×60 |
|
2021-03-21 |
60×60×24 |
|
2021-03-01 |
60×60×24 |
|
2021-01-01 |
60×60×24 |
|
2021-03 |
60×60×24×31 |
|
2021 |
60×60×24×365 |
|
2021-03-21/2021-03-23 |
60×60×24×3 |
Software repository
Source code for the functions is available on GitHub.
Compiled code is available from GBIF’s repository; choose the latest version and download cube-functions-<VERSION>.jar.