I have been working on integrating the WorldFlora R package into SEOSAW so we can use the World Flora Online taxonomic backbone to match taxonomic names in our tree inventory data. In doing this work I have learned that: a) the WorldFlora R package has some drawbacks, namely that it requires a large download of a static version of the WFO database, which also then needs to be read into memory in R, and b) that the WFO has a GraphQL API to access their database remotely without downloading the entire thing.
I have since developed a prototype R package which uses the API to return information from the WFO database, and contains functions for matching taxonomic names.
The package can be found here: https://github.com/johngodlee/wfoAPI
The top-level function is matchNames(), which takes a vector of taxonomic names and queries the WFO database for each name. Notable features of the function are:
- Leverages the fuzzy matching algorithm which is implemented server-side by WFO, and you can tweak parameters to control the behaviour of the algorithm.
- When multiple possible names are matched, the user can optionally enter an interactive selection mode to pick a specific name.
- When a matched name has a synonym, the function returns the accepted name as well.
- Previous API calls are cached locally, to reduce the number of API calls, and to prevent entering interactive mode multiple times for the same name.
callAPI() is the base-level function which constructs the API query and uses the httr2 package to send the query and unpack the returns.
matchName() calls callAPI() and is responsible for handling a single name query. matchName() optionally uses cached data from previous API calls if it exists. If multiple candidate names are matched, the user can optionally enter an interactive selection mode, which is handled by pickName().
matchNames() calls matchName() for each name in the vector of taxonomic names, performs various quality-of-life checks like exiting gracefully if the WFO API is not reachable, and formats the results as a pretty dataframe with one row for each name in the original vector.
These are the arguments available for matchNames():
x- vector of taxonomic namesfallbackToGenus- logical, if TRUE genus-level matches will be returned if no species-level match is availablecheckRank- logical, if TRUE consider matches to be ambiguous if it is possible to estimate taxonomic rank from the search string and the rank does not match that in the name recordcheckHomonyms- logical, if TRUE consider matches to be ambiguous if there are other names with the same words but different author stringsfuzzyNameParts- integer value of 0 (default) or greater. The maximum Levenshtein distance used for fuzzy matching words inxinteractive- logical, if TRUE (default) user will be prompted to pick names from a list where multiple ambiguous matches are found, otherwise names with multiple ambiguous matches will be skippeduseCache- logical, if TRUE use cached values inoptions("wfo.api_uri")preferentially, to reduce the number of API callsuseAPI- logical, if TRUE (default) allow API callsraw- logical, if TRUE raw a nested list is returned, otherwise a dataframe
fallbackToGenus, checkRank, checkHomonyms and fuzzyNameParts are all variables passed directly to the WFO GraphQL API.
Here is a basic example:
x <- c(
"Burkea africana",
"Julbernardia paniculata",
"Fabaceae",
"Indet indet",
"Brachystegia",
"Philenoptera sp.")
matchNames(x)
The console output:
1 of 6: Brachystegia
2 of 6: Burkea africana
3 of 6: Fabaceae
4 of 6: Indet indet
No candidates, skipping: Indet indet
5 of 6: Julbernardia paniculata
6 of 6: Philenoptera sp.
--- Pick a name ---
Matching string: Philenoptera sp.
1 wfo-4000029211 Philenoptera Hochst. ex A.Rich. accepted Code/Plantae/Pteridobiotina/Angiosperms/Fabales/Fabaceae/Papilionoideae/Philenoptera
[ins] Enter a number to pick a row from the list, a valid WFO ID, 'N' for the next page, 'P' for the previous page, 'S' to skip this name:
The dataframe returned:
| taxon_name_subm | method | fallbackToGenus | checkRank | checkHomonyms | fuzzyNameParts | taxon_wfo_syn | taxon_name_syn | taxon_auth_syn | taxon_stat_syn | taxon_role_syn | taxon_rank_syn | taxon_path_syn | taxon_wfo_acc | taxon_name_acc | taxon_auth_acc | taxon_stat_acc | taxon_role_acc | taxon_rank_acc | taxon_path_acc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Burkea africana | AUTO | FALSE | FALSE | FALSE | 0 | wfo-0000214110 | Burkea africana | Hook. | valid | accepted | species | Code/Plantae/Pteridobiotina/Angiosperms/Fabales/Fabaceae/Caesalpinioideae/Burkea/africana | wfo-0000214110 | Burkea africana | Hook. | valid | accepted | species | Code/Plantae/Pteridobiotina/Angiosperms/Fabales/Fabaceae/Caesalpinioideae/Burkea/africana |
| Julbernardia paniculata | AUTO | FALSE | FALSE | FALSE | 0 | wfo-0000169220 | Julbernardia paniculata | (Benth.) Troupin | valid | accepted | species | Code/Plantae/Pteridobiotina/Angiosperms/Fabales/Fabaceae/Detarioideae/Julbernardia/paniculata | wfo-0000169220 | Julbernardia paniculata | (Benth.) Troupin | valid | accepted | species | Code/Plantae/Pteridobiotina/Angiosperms/Fabales/Fabaceae/Detarioideae/Julbernardia/paniculata |
| Fabaceae | AUTO | FALSE | FALSE | FALSE | 0 | wfo-7000000323 | Fabaceae | Lindl. | conserved | accepted | family | Code/Plantae/Pteridobiotina/Angiosperms/Fabales/Fabaceae | wfo-7000000323 | Fabaceae | Lindl. | conserved | accepted | family | Code/Plantae/Pteridobiotina/Angiosperms/Fabales/Fabaceae |
| Indet indet | EMPTY | FALSE | FALSE | FALSE | 0 |