There are lots of R packages to generate species by site abundance matrices from a long-format dataframe of records. For example, labdsv::matrify()
takes a matrix like this:
Site | Species | Abundance |
---|---|---|
A | Quercus robur | 10 |
B | Quercus robur | 2 |
B | Betula pendula | 30 |
… | … | … |
This method relies on already having the data summarised, but what if each row was a record, as would be the case if you had raw tree diameter measurements, rather than merely a count of abundance:
Site | Species | DBH |
---|---|---|
A | Quercus robur | 15.6 |
A | Quercus robur | 5.4 |
A | Betula pendula | 11.0 |
… | … | … |
It wouldn’t be hard to turn this into a summary table with some dplyr
:
count(dat, Site, Species)
Additionally, what if individuals vary according sampling effort, for example if species less than 10 cm DBH were only measured in a 20x10 m box within a large 20x50 m plot:
Site | Species | DBH | FPC |
---|---|---|---|
A | Quercus robur | 15.6 | 1 |
A | Quercus robur | 5.4 | 0.2 |
A | Betula pendula | 11.0 | 1 |
… | … | … | … |
Or if the measure of abundance isn’t individual presence, but the canopy cover of the individual:
Site | Species | DBH | Cover |
---|---|---|---|
A | Quercus robur | 15.6 | 2.53 |
A | Quercus robur | 5.4 | 1.01 |
A | Betula pendula | 11.0 | 2.40 |
… | … | … | … |
Then it becomes much harder to create one of these matrices.
Wouldn’t it be nice to have a base R function to create species by site abundance matrices, which can deal with sampling effort, alternative methods of abundance, and unsummarised data.
#' Generate a species by site abundance matrix
#'
#' @param x dataframe of individual records
#' @param site_id column name string of site IDs
#' @param species_id column name string of species names
#' @param fpc optional column name string of sampling weights of each record,
#' between 0 and 1
#' @param abundance optional column name string with an alternative abundance
#' measure such as biomass, canopy cover, body length
#'
#' @return dataframe of species abundances (columns) per site (rows)
#'
#' @examples
#' x <- data.frame(site_id = rep(c("A", "B", "C"), each = 3),
#' species_id = sample(c("a", "b", "c", "d"), 9, replace = TRUE),
#' fpc = rep(c(0.5, 0.6, 1), each = 3),
#' abundance = seq(1:9))
#' abMat(x, "site_id", "species_id")
#' abMat(x, "site_id", "species_id", "fpc")
#' abMat(x, "site_id", "species_id", "fpc", "abundance")
#'
#' @export
#'
abMat <- function(x, site_id, species_id, fpc = NULL, abundance = NULL) {
# If no fpc or abundance, make 1
if (is.null(fpc)) {
x$fpc <- 1
} else {
x$fpc <- x[[fpc]]
}
if (is.null(abundance)) {
x$abundance <- 1
} else {
x$abundance <- x[[abundance]]
}
# Get all species and sites
species <- unique(x[[species_id]])
sites <- unique(x[[site_id]])
# Create empty species by site matrix
comm <- matrix(0, nrow = length(sites), ncol = length(species))
# Fill matrix
for (i in seq(length(sites))) {
for(j in seq(length(species))) {
abu <- x[x[[site_id]] == sites[i] & x[[species_id]] == species[j],
c(site_id, species_id, "fpc", "abundance")]
comm[i,j] <- sum(1 * abu$abundance / abu$fpc, na.rm = TRUE)
}
}
# Make tidy with names
comm <- data.frame(comm)
names(comm) <- species
row.names(comm) <- sites
return(comm)
}