John L. Godlee

A colleague had a list of plant species names from a regional checklist they have compiled. They wanted to add a description of the growth form to each species, but with over 2000 species it was becoming laborious to look up each species individually online.

This was my email response:

There are obviously many different data sources you could use to get information on growth form, but many of them will be incomplete, and they will vary in how easy it is to process the data. Two of the best in terms of coverage and systematic recording of growth form are probably World Flora Online (WFO, https://www.worldfloraonline.org/ ) and the TRY traits database (https://www.try-db.org/TryWeb/Home.php) .

WFO has growth form information on their website for some species, but I have been unable to find a way to scrape this information. They have an API, but it only returns taxonomic information. If you were to use this method you would have to search for each species individually and copy the data from the table. Realistically it might only take a couple of days, maybe you could enlist the help of some eager Masters students?! It might be possible to scrape the Data table from each of the species, but when I tried this I kept getting 403 denied errors.

TRY has growth form information for many species. You can Download the “Plant Growth Form” data (trait ID 42) from their website. You have to submit a data request, but it’s fairly quick to do and they are approved automatically after a waiting period of a few hours so long as you only use their public dataset. Alternatively or in addition to this, you could look at their categorical traits table, which is a snapshot from the original 2012 debut of the database.

One key consideration is aligning the taxonomic names in your species list with those in whatever growth form data source you end up using. I would recommend using the WorldFlora R package to do this. I have attached an R script (below) which shows how to do this.

# Packages
library(dplyr)
library(readxl)
library(WorldFlora)

# Import data
x <- read_excel("./species.xlsx")

# Get first two words
x_clean$species_sanit <- unlist(lapply(strsplit(x_clean$species_ws, " "), function(y) { 
  paste(na.omit(y[1:2]), collapse = " ")
}))

# Find duplicated species names
# These species have different authorities but the same name. 
# For WorldFlora I will only use the species name, without the authority 
stopifnot(all(!duplicated(x_clean$species)))
x_clean$species_sanit[duplicated(x_clean$species_sanit)]

# Extract unique species names
x_un <- unique(x_clean$species_sanit)

# Download WFO (WorldFlora Online) back-bone data
WFO.download(save.dir = "./dat", WFO.remember = FALSE)

# Load WFO back-bone data from downloaded file
WFO.remember(WFO.file = "./dat/wfo/classification.csv")

# Run species names through WFO matching function
x_wfo <- WFO.match(x_un, WFO.data = WFO.data, Fuzzy = 0) 

# Keep only unique species names
x_wfo_clean <- x_wfo %>% 
  dplyr::select(
    species_orig = spec.name.ORIG,
    species_wfo = scientificName) %>% 
  distinct() %>% 
  filter(!is.na(species_wfo))

# Import TRY categorical database
# You can get this file from try-db.org in their categorical datasets page.
# You could supplement this with the data you request from the current database.
try_db <- read_excel("./dat/Try2025426112154TRY_Categorical_Traits_Lookup_Table_2012_03_17_TestRelease/TRY_Categorical_Traits_Lookup_Table_2012_03_17_TestRelease.xlsx")

# Run TRY species names through WFO matching function
# Warning, this can take a while
try_wfo <- WFO.match(try_db$AccSpeciesName,
  WFO.data = WFO.data, Fuzzy = 0)

# Keep only unique species names
try_wfo_clean <- try_wfo %>% 
  dplyr::select(
    species_orig = spec.name.ORIG,
    species_wfo = scientificName) %>% 
  distinct()

# Add WFO matched species names to TRY data
try_db_wfo <- left_join(try_db, try_wfo_clean, 
  by = c("AccSpeciesName" = "species_orig")) %>% 
  rename(species_orig = AccSpeciesName)

# Filter TRY data to species in Tchamba's species list
try_db_wfo_fil <- try_db_wfo %>% 
  filter(species_wfo %in% x_wfo_clean$species_wfo) %>% 
  dplyr::select(species_orig, PlantGrowthForm) %>% 
  distinct()

# Join growth form and taxonomic names back to original data table
out <- left_join(x_clean, try_db_wfo_fil, 
  by = c("species_sanit" = "species_orig")) %>% 
  dplyr::select(-species_ws, -species_sanit)

# Write table to CSV
write.csv(out, "./out.csv", row.names = FALSE)

Gathering data on plant growth form for a regional species checklist

2025-04-26