Making abundance matrices

2020-10-31

There are lots of R packages to generate species by site abundance matrices from a long-format dataframe of records. For example, labdsv::matrify() takes a matrix like this:

Site Species Abundance
A Quercus robur 10
B Quercus robur 2
B Betula pendula 30

This method relies on already having the data summarised, but what if each row was a record, as would be the case if you had raw tree diameter measurements, rather than merely a count of abundance:

Site Species DBH
A Quercus robur 15.6
A Quercus robur 5.4
A Betula pendula 11.0

It wouldn’t be hard to turn this into a summary table with some dplyr:

count(dat, Site, Species)

Additionally, what if individuals vary according sampling effort, for example if species less than 10 cm DBH were only measured in a 20x10 m box within a large 20x50 m plot:

Site Species DBH FPC
A Quercus robur 15.6 1
A Quercus robur 5.4 0.2
A Betula pendula 11.0 1

Or if the measure of abundance isn’t individual presence, but the canopy cover of the individual:

Site Species DBH Cover
A Quercus robur 15.6 2.53
A Quercus robur 5.4 1.01
A Betula pendula 11.0 2.40

Then it becomes much harder to create one of these matrices.

Wouldn’t it be nice to have a base R function to create species by site abundance matrices, which can deal with sampling effort, alternative methods of abundance, and unsummarised data.

#' Generate a species by site abundance matrix
#'
#' @param x dataframe of individual records
#' @param site_id column name string of site IDs
#' @param species_id column name string of species names
#' @param fpc optional column name string of sampling weights of each record, 
#'     between 0 and 1 
#' @param abundance optional column name string with an alternative abundance 
#'     measure such as biomass, canopy cover, body length
#'
#' @return dataframe of species abundances (columns) per site (rows)
#' 
#' @examples
#' x <- data.frame(site_id = rep(c("A", "B", "C"), each = 3), 
#'   species_id = sample(c("a", "b", "c", "d"), 9, replace = TRUE), 
#'   fpc = rep(c(0.5, 0.6, 1), each = 3), 
#'   abundance = seq(1:9))
#' abMat(x, "site_id", "species_id")
#' abMat(x, "site_id", "species_id", "fpc")
#' abMat(x, "site_id", "species_id", "fpc", "abundance")
#' 
#' @export
#' 
abMat <- function(x, site_id, species_id, fpc = NULL, abundance = NULL) {
  # If no fpc or abundance, make 1
  if (is.null(fpc)) {
    x$fpc <- 1
  } else {
  	x$fpc <- x[[fpc]]
  }
  if (is.null(abundance)) {
    x$abundance <- 1 
  } else {
  	x$abundance <- x[[abundance]]
  }

  # Get all species and sites
  species <- unique(x[[species_id]])
  sites <- unique(x[[site_id]])

  # Create empty species by site matrix
  comm <- matrix(0, nrow = length(sites), ncol = length(species))

  # Fill matrix
  for (i in seq(length(sites))) {
    for(j in seq(length(species))) {
      abu <- x[x[[site_id]] == sites[i] & x[[species_id]] == species[j], 
        c(site_id, species_id, "fpc", "abundance")]
      comm[i,j] <- sum(1 * abu$abundance / abu$fpc, na.rm = TRUE)
    }
  }

  # Make tidy with names
  comm <- data.frame(comm)
  names(comm) <- species
  row.names(comm) <- sites

  return(comm)
}