Taxonomy interpretation
Multiple taxonomy support
To facilitate searching and metric generation, all occurrence records are matched to two taxonomies:
-
Catalogue of Life eXtended Release (COL XR) - This is the primary taxonomy used by GBIF. Catalogue of Life is a global index of species, which aims to provide a comprehensive and authoritative list of the world’s species. It is compiled from multiple taxonomic datasets and is updated regularly. The eXtended Release (COL XR) builds on the Base Release by programmatically integrating additional data sources. It integrates information from over 58,000 overlapping taxonomic and nomenclatural global, regional, national and management data sources (checklists) as well as originating from digitised literature available in Catalogue of Life’s infrastructure ChecklistBank.
-
GBIF backbone - This is a taxonomy that, up until recently, GBIF has been building and integrating periodically. It is primarily based on an older version of Catalogue of Life with additional taxa added in an automated way from other taxonomic datasets. The build process has now been discontinued in favour of the eXtended release of the Catalogue of Life. The GBIF Backbone will no longer be updated, but will remain available for backwards compatibility. All GBIF Backbone identifiers will be preserved and supported in the API.
API support
The GBIF API allows searching and retrieving occurrence records using either taxonomy.
You can specify the taxonomy with the optional checklistKey query parameter.
If this parameter is not provided, the API defaults to the GBIF backbone taxonomy.
|
Valid values for this parameter are the dataset keys of the taxonomic datasets registered in GBIF.
Currently supported values include:
-
checklistKey=7ddf754f-d193-4cc9-b351-99906754a03bfor the Catalogue of Life eXtended Release -
checklistKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36cfor the GBIF backbone.
The following subsections will describe how the API supports multiple taxonomies.
Taxonomy matching
For each occurrence record GBIF attempts to match that record to a taxon in each taxonomy. Therefore, each occurrence is assigned taxonKeys that link to the corresponding taxon in both the GBIF Backbone and the Catalogue of Life.
These keys are retrieved by querying our species match API, depending on whether the following Darwin Core fields were supplied by the data publisher with the occurrence record:
There are more details about parameters and response formats in the API documentation.
If the scientificName is not present in the published occurrence record, it will be assembled from the individual name parts if present: genus, specificEpithet, genericName and infraspecificEpithet. Having a higher classification qualifying the scientificName improves the accuracy of the taxonomic match in two ways, even if it is just the family or even the kingdom.
-
In the case of homonyms or similarly spelt names, the service has no way to verify the potential matches, so such names will often get higher taxon matches.
-
In case a given scientific name is not (yet) part of the taxonomy, GBIF can at least match the record to some higher taxon in that taxonomy, such as the genus.
Fuzzy name matching, matching to a higher taxon or matching to no taxon are issue flags we assign to records.
Identifier-based matching
The version 2 of the species matching API supports searching with the identifier fields taxonID, scientificNameID or taxonConceptID as query parameters. If both an identifier field and a scientificName are provided, the identifier field will be used first for matching. A match will also be made using the scientificName and a comparison will be made between the two matches. If they do not agree, a flag will be issued in the response to indicate the inconsistency.
Matching against different taxonomies
The optional checklistKey parameter specifies the taxonomy to match against.
Example matching request against the COL XR
curl -s "http://api.gbif.org/v2/species/match?scientificName=Acacia+acuminata&checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b"
The version 2 response format that includes additional information such as the IUCN RedList category for the taxon if available.
Example response for a match against COL XR:
{
"usage": {
"key": "5WZLF",
"name": "Carcharodon carcharias (Linnaeus, 1758)",
"canonicalName": "Carcharodon carcharias",
"authorship": "(Linnaeus, 1758)",
"rank": "SPECIES",
"code": "ZOOLOGICAL",
"status": "ACCEPTED",
"genericName": "Carcharodon",
"specificEpithet": "carcharias",
"type": "SCIENTIFIC",
"formattedName": "<i>Carcharodon</i> <i>carcharias</i> (Linnaeus, 1758)"
},
"classification": [
{
"key": "CS5HF",
"name": "Eukaryota",
"rank": "DOMAIN"
},
{
"key": "N",
"name": "Animalia",
"rank": "KINGDOM"
},
{
"key": "CH2",
"name": "Chordata",
"rank": "PHYLUM"
},
{
"key": "8V4V3",
"name": "Vertebrata",
"rank": "SUBPHYLUM"
},
{
"key": "8V4V5",
"name": "Gnathostomata",
"rank": "INFRAPHYLUM"
},
{
"key": "8X6G5",
"name": "Chondrichthyes",
"rank": "PARVPHYLUM"
},
{
"key": "LB",
"name": "Elasmobranchii",
"rank": "CLASS"
},
{
"key": "3F5",
"name": "Lamniformes",
"rank": "ORDER"
},
{
"key": "CB2M7",
"name": "Lamnidae",
"rank": "FAMILY"
},
{
"key": "C973Q",
"name": "Carcharodon",
"rank": "GENUS"
},
{
"key": "5WZLF",
"name": "Carcharodon carcharias",
"rank": "SPECIES"
}
],
"diagnostics": {
"matchType": "EXACT",
"confidence": 99,
"timeTaken": 22,
"timings": {
"nameNRank": 0,
"sciNameMatch": 22,
"nameParse": 1,
"luceneMatch": 21
}
},
"additionalStatus": [
{
"clbDatasetKey": "53131",
"datasetAlias": "IUCN",
"datasetKey": "19491596-35ae-4a91-9a98-85cf505f1bd3",
"status": "VULNERABLE",
"statusCode": "VU",
"sourceId": "3855"
}
],
"synonym": false,
"left": 1049700,
"right": 1049701
}
Occurrence search API
The Occurrence Search API supports querying with either taxonomy by including the optional checklistKey parameter in the request.
Example:
curl -s "http://api.gbif.org/v1/occurrence/search?scientificName=Acacia+acuminata&checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b"
Response format
The Occurrence API response has been extended to include the taxonomic information from multiple taxonomies. The classifications array contains the taxonomic information from all taxonomies that were matched to the occurrence record. Note: the example response has been shortened for brevity.
{
"key": 5104646682,
"datasetKey": "50c9509d-22c7-4a22-a47d-8c48425ef4a7",
"basisOfRecord": "HUMAN_OBSERVATION",
"occurrenceStatus": "PRESENT",
"classifications": {
"7ddf754f-d193-4cc9-b351-99906754a03b": {
"usage": {
"key": "BSJCX",
"name": "Acacia acuminata Benth.",
"rank": "SPECIES",
"code": "BOTANICAL",
"authorship": "Benth.",
"genericName": "Acacia",
"specificEpithet": "acuminata",
"formattedName": "<i>Acacia</i> <i>acuminata</i> Benth."
},
"acceptedUsage": {
"key": "BSJCX",
"name": "Acacia acuminata Benth.",
"rank": "SPECIES",
"code": "BOTANICAL",
"authorship": "Benth.",
"genericName": "Acacia",
"specificEpithet": "acuminata",
"formattedName": "<i>Acacia</i> <i>acuminata</i> Benth."
},
"taxonomicStatus": "ACCEPTED",
"classification": [
{
"key": "CS5HF",
"name": "Eukaryota",
"rank": "DOMAIN"
},
{
"key": "P",
"name": "Plantae",
"rank": "KINGDOM"
},
{
"key": "CMQ8S",
"name": "Pteridobiotina",
"rank": "SUBKINGDOM"
},
{
"key": "TP",
"name": "Tracheophyta",
"rank": "PHYLUM"
},
{
"key": "MG",
"name": "Magnoliopsida",
"rank": "CLASS"
},
{
"key": "383",
"name": "Fabales",
"rank": "ORDER"
},
{
"key": "623QT",
"name": "Fabaceae",
"rank": "FAMILY"
},
{
"key": "C8VYK",
"name": "Acacia",
"rank": "GENUS"
},
{
"key": "BYZSL",
"name": "Juliflorae",
"rank": "SECTION_BOTANY"
},
{
"key": "BSJCX",
"name": "Acacia acuminata",
"rank": "SPECIES"
}
],
"iucnRedListCategoryCode": "LC",
"issues": [
"TAXON_ID_NOT_FOUND"
]
},
"d7dddbf4-2cf0-4f39-9b2a-bb099caae36c": {
"usage": {
"key": "2979180",
"name": "Acacia acuminata Benth.",
"rank": "SPECIES",
"authorship": "Benth.",
"genericName": "Acacia",
"specificEpithet": "acuminata",
"formattedName": "<i>Acacia</i> <i>acuminata</i> Benth."
},
"acceptedUsage": {
"key": "2979180",
"name": "Acacia acuminata Benth.",
"rank": "SPECIES",
"authorship": "Benth.",
"genericName": "Acacia",
"specificEpithet": "acuminata",
"formattedName": "<i>Acacia</i> <i>acuminata</i> Benth."
},
"taxonomicStatus": "ACCEPTED",
"classification": [
{
"key": "6",
"name": "Plantae",
"rank": "KINGDOM"
},
{
"key": "7707728",
"name": "Tracheophyta",
"rank": "PHYLUM"
},
{
"key": "220",
"name": "Magnoliopsida",
"rank": "CLASS"
},
{
"key": "1370",
"name": "Fabales",
"rank": "ORDER"
},
{
"key": "5386",
"name": "Fabaceae",
"rank": "FAMILY"
},
{
"key": "2978223",
"name": "Acacia",
"rank": "GENUS"
},
{
"key": "2979180",
"name": "Acacia acuminata",
"rank": "SPECIES"
}
],
"iucnRedListCategoryCode": "LC",
"issues": [
"TAXON_ID_NOT_FOUND"
]
}
}
}
Occurrence download API
The occurrence download API supports downloading occurrence records using either taxonomy.
Occurrence download predicates
The predicate search API includes the checklistKey parameter to specify the taxonomy to be used for filtering occurrence records.
Example:
{
"creator": "userName",
"sendNotification": false,
"format": "SIMPLE_CSV",
"predicate": {
"type": "equals",
"key": "TAXON_KEY",
"value": "5WZLF",
"checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
}
}
Occurrence download content
Users can specify the taxonomy to be included in occurrence downloads by adding the checklistKey parameter to the download request. By default, the GBIF Backbone will be used if no checklistKey is supplied.
{
"creator": "userName",
"notificationAddresses": [
"userEmail@example.org"
],
"sendNotification": true,
"format": "SIMPLE_CSV",
"predicate": {
"type": "and",
"predicates": [
{
"type": "equals",
"key": "BASIS_OF_RECORD",
"value": "PRESERVED_SPECIMEN"
},
{
"type": "in",
"key": "COUNTRY",
"values": [ "VC", "GD" ]
}
]
},
"checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
}
For more information on the download API, see the API documentation.
Taxonomic indexes
With every update of a taxonomy, versioned docker containers are created to support the species matching API. These containers are made available through the GBIF docker registry and can be used locally without depending on online services.
The docker container indexes are built from ChecklistBank, which provides the organized taxonomic data, names, and hierarchy.
In addition, stable unique identifiers for taxa from other taxonomic datasets such as Dyntaxa, IPNI, ITIS, UK Species Inventory and WoRMs are retrieved from ChecklistBank and linked to taxa in the checklist (e.g. COL XR or GBIF Backbone).
This enables matching using scientificNameID, taxonID or taxonConceptID fields if these identifiers are used by data publishers.
The IUCN RedList category is also linked to taxa in the index. The IUCN RedList information is pulled from ChecklistBank when the docker containers are built and linked to the checklist (e.g. COL XR or GBIF Backbone).
Docker containers
All available matching containers from the GBIF docker registry are listed here
The image tags are made up of several pieces of information:
{taxonomy}-{architecture}-{checklistbank-datasetKey}-{date}-{time}
Hence, the image xcol-arm64-308651-20250516-145444 exposes the COL eXtended Release with datasetKey=308651 and was build on the 16th of May 2025.
You can also pull and run the latest COL XR like this:
docker run -d -p 8080:8080 --name colxr docker.gbif.org/matching-ws:xcol-amd64-latest
# use arm64 image instead of amd64 on a mac with apple silicon
docker run -d -p 8080:8080 --name colxr docker.gbif.org/matching-ws:xcol-arm64-latest
Once running, metadata about the indexed data can be retrieved:
curl -s "http://localhost:8080/v2/species/match/metadata"
Example query URL using local docker container:
curl -s "http://localhost:8080/v2/species/match?scientificName=Oenanthe&scientificNameAuthorship=L.&taxonRank=genus&kingdom=Plantae"