1 Introduction

This worksheet provides examples of how to prepare SESOAW repeat stem measurement data to measure carbon dynamics. The code chunks are all written in R. The example dataset used in this worksheet (“stems_ABG.csv”) can be found in the workshop Sharepoint folder.

First, load the required packages. If you haven’t installed the packages you should do so with install.packages("<pkg>"), where <pkg> is the name of the package.

library(dplyr)
library(ggplot2)
library(seosawr)
# source("rscripts/seosawr_fn.R")
library(ProdVital)
# source("rscripts/ProdVital_fn.R")

Next, load the data. Either use your own data, or one of the example datasets provided for the workshop.

s <- read.csv("seosaw_data/stems_ABG.csv")
p <- read.csv("seosaw_data/plots_ABG.csv")

Each row in the stems table is one measurement of one stem in one census. Each row in the plots table is one census of one plot. You can find more information about the columns in each of these tables by reading the “plot_columns.csv” and “stem_columns.csv” files in the Sharepoint folder.

An example stems table:

An example plots table:

2 Stem mortality timelines

Sometimes, a stem will appear to die only to resurrect at a later date. Stems might also be missing stem status values. It is important when measuring rates of recruitment and mortality to have consistent stem mortality timelines.

An example stem mortality timeline. Subscript 'd' indicates a diameter measurement. Here, two datapoints are missing. The first is interpolated as alive, as there are alive measurements before and after. The second is imputed as dead, because it is followed by a dead measurement and has no diameter measurement.

Figure 2.1: An example stem mortality timeline. Subscript ‘d’ indicates a diameter measurement. Here, two datapoints are missing. The first is interpolated as alive, as there are alive measurements before and after. The second is imputed as dead, because it is followed by a dead measurement and has no diameter measurement.

We can try to fix these issues using the seosawr::statusImputGen() function.

# Impute mortality status
mor <- statusImputGen(s)

# Add new columns to stem data
s_mor <- bind_cols(s, mor)

mor has three columns, the corrected vital status (stem_status_est), and a column which records if and why the vital status has been changed (chg).

3 Removing dead stems

Now that the stem status values have been fixed, we should remove measurements of dead stems. The functions in {ProdVital} assume that the stem data only contain measurements from living stems. Measurements from dead stems should be excluded from the stem data.

It’s up to you whether you count resprouting stems (status = “r”, stem dead above point of measurement but with living material elsewhere on the tree) as alive or dead.

s_a <- s_mor %>%
  filter(stem_status_est == "a")

4 Outlier measurements

Stem diameter data may include errors introduced during data collection or data entry. These errors can occur for a number of reasons. Common errors include decimal points in the wrong place or typos which add extra digits.

A simple way to identify outlier diameter measurements is to compare them with height measurements. Stem diameter and stem height tend to have a strong positive correlation.

ggplot(s_a, aes(x = diam, y = height)) + 
  geom_point()
## Warning: Removed 7926 rows containing missing values (`geom_point()`).

To formally identify outliers, you could fit a regression model of diameter vs. height and flag measurements which fall outside a given number of standard deviations from the regression fit. The relationship between diameter and height is not always linear, but close enough most of the time.

The seosawr::diamHeightCheck() function can be used to identify outliers. The function fits a linear model to diameter-height data, and returns a dataframe with plot IDs, stem IDs and census dates for measurements identified as outliers.

# Identify outlier diameter measurements
dflag <- diamHeightCheck(s_a)[[1]]

# Join outliers to data
dflag$dflag <- TRUE
s_d <- s_a %>% 
  left_join(., dflag, by = c("plot_id", "stem_id", "census_date")) %>% 
  mutate(dflag = ifelse(is.na(dflag), FALSE, dflag))

# Plot data, highlight outliers
ggplot(s_d, aes(x = diam, y = height)) +
  geom_point(aes(colour = dflag)) + 
  geom_smooth(method = "lm", formula = y ~ x)
## Warning: Removed 7926 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 7926 rows containing missing values (`geom_point()`).

Remember though, short fat stems can also be caused by stems breaking. Similarly, tall skinny stems might not always constitute an error.

If you don’t have height measurements, you could start by looking at the distribution of diameter measurements. Very high values might be outliers and so are worth double checking.

ggplot(s_a, aes(x = diam)) + 
  geom_histogram(colour = "black", binwidth = 5, boundary = 0)
## Warning: Removed 334 rows containing non-finite values (`stat_bin()`).

s_a %>% 
  slice_max(diam, n = 20) %>% 
  dplyr::select(plot_id, stem_id, census_date, diam)
##    plot_id stem_id census_date diam
## 1    ABG_5      31        2019 88.0
## 2   ABG_10     202        2021 87.0
## 3   ABG_10     202        2019 86.3
## 4    ABG_5      31        2021 85.2
## 5    ABG_4       2        2021 85.1
## 6    ABG_7     869        2019 83.1
## 7    ABG_4       2        2018 83.0
## 8    ABG_7     869        2021 83.0
## 9   ABG_10     175        2021 82.3
## 10  ABG_10     175        2019 82.0
## 11  ABG_12      64        2021 81.9
## 12  ABG_12     185        2019 81.2
## 13  ABG_12      64        2019 81.1
## 14   ABG_5      41        2021 78.2
## 15   ABG_5      41        2019 77.4
## 16  ABG_10     174        2019 75.5
## 17  ABG_15      26        2021 72.0
## 18   ABG_4     319        2018 71.8
## 19   ABG_9     262        2019 71.6
## 20   ABG_9     262        2021 71.5
## 21  ABG_15      26        2019 71.5
s_a$min_diam_thresh <- p$min_diam_thresh[match(s_a$plot_id, p$plot_id)]

which(s_a$diam < s_a$min_diam_thresh)
##  [1]   363   406   421   423   424   456   463   847  1011  1518  1666  3697  3760  4708  4984  8492
## [17]  9371 12351 13345 13606 13668 13697 13700 14026 14098 14908 15150

Values which are below the minimum diameter threshold of the plot (column min_diam_thresh in plots table) might also be errors.

Comparing stem size between censuses can also identify outliers. If a stem is growing at a particular rate, then suddenly grows much faster, this might indicate an error.

Similarly, if a stem appears to be growing at a much faster rate than other stems in the plot or within the same species, this might indicate an error.

Finally, if a stem suddenly appears to get much smaller, this could indicate an error. Small reductions in diameter do occur naturally however, either caused by changes in water availability, random measurement error, or where stems are near death and begin to dry out and lose their bark. Also remember that where there are only two censuses, it’s difficult to tell whether the error comes from the first measurement or the second measurement.

The seosawr::diamChangeCheck() function can identify stems where the annual diameter increment is greater than a given annual threshold (thresh).

# Identify outlier diameter measurements
dcflag <- diamChangeCheck(s_d, thresh = 4)[[1]]

# Join outliers to data
dcflag$dcflag <- TRUE
s_dc <- s_d %>% 
  left_join(., dcflag, by = c("plot_id", "stem_id")) %>% 
  mutate(dcflag = ifelse(is.na(dcflag), FALSE, dcflag))

# Return plot and stem IDs where growth has been flagged
s_dc %>% 
  filter(dcflag == TRUE) %>% 
  dplyr::select(plot_id, stem_id) %>% 
  head()
##   plot_id stem_id
## 1   ABG_4     242
## 2   ABG_4     242
## 3   ABG_5     176
## 4   ABG_5     176
## 5   ABG_5     204
## 6   ABG_5     204
# Plot data, highlight outliers
ggplot(s_dc,
  aes(x = census_date, y = diam)) +
  geom_line(aes(alpha = dcflag, colour = dcflag, 
      group = interaction(plot_id, stem_id))) + 
  scale_alpha_manual(values = c(0.1, 1))
## Warning: Removed 334 rows containing missing values (`geom_line()`).

5 POM

The Point Of Measurement (POM, column pom in stem table) describes the height up the stem at which the diameter was measured.

Illustration of the Point of Measurement (POM).

Figure 5.1: Illustration of the Point of Measurement (POM).

An example stem POM timeline. Here, two datapoints are missing. The first is imputed as the plot default POM, because the values before and after are not the same. The second is extrapolated from the previous measurement, as it comes at the end of the timeline. Finally, the 1.5 m POM measurement should not be compared with the other 1.3 m POM measurements, because this will lead to biased estimates of growth due to stem taper.

Figure 5.2: An example stem POM timeline. Here, two datapoints are missing. The first is imputed as the plot default POM, because the values before and after are not the same. The second is extrapolated from the previous measurement, as it comes at the end of the timeline. Finally, the 1.5 m POM measurement should not be compared with the other 1.3 m POM measurements, because this will lead to biased estimates of growth due to stem taper.

Where POM is missing, we can use the seosawr::pomImputGen() function to fill the gaps.

# Fill missing POM values
pom_imp <- pomImputGen(s_dc, p)
s_dc$pom_est <- pom_imp$pom_est

Due to stem tapering, when a POM is moved upwards to avoid an abnormality on the stem, it may appear that the stem diameter has become much smaller. When calculating growth rates, it is therefore important to account for changes in POM, so they do not bias measurements. The simplest way around this issue is to simply exclude census intervals where the POM changes, which can be done later after we have calculated growth rates.

Alternatively, there are many models of stem taper that have been developed, and you could use one of these to estimate the stem diameter at a standard height (normally 1.3 m). The seosawr::diamAdj() function implements one of these models of stem taper.

# Adjust diameter where POM has moved
s_dc$diam_adj <- diamAdj(s_dc, pom = "pom_est")
## Warning: POMs >1.7 m outside calibration data, no correction applied at rows 
## 1007, 1008, 2341, 2342, 2343, 2344, 2345, 3296, 3297, 3351, 3352, 3555, 3556, 4052, 4053 ...

If you have plenty of data, you might choose to simply remove the stem entirely if the POM changes.

# Exclude stems where POM has moved
s_pom <- s_dc %>% 
  group_by(plot_id, stem_id) %>% 
  filter(n_distinct(pom_est) == 1) %>% 
  ungroup()

6 Duplicate measurements

It’s important to check that measurements in your dataset are unique. The functions in {ProdVital} and {seosawr} will fail if a stem has been measured more than once per census. The seosawr::measurementUniqueCheck() function can check this.

# Flag duplicate measurements
measurementUniqueCheck(s_pom)
## $`Duplicated stem measurements`
## # A tibble: 2 × 3
##   plot_id stem_id census_date
##   <chr>   <chr>         <int>
## 1 ABG_16  700            2021
## 2 ABG_16  730            2021

The function returns a dataframe of plot IDs, stem IDs, and census dates where measurements have been duplicated. It’s then up to you to decide how to deal with these duplicates.

7 Species names

While it is not essential to estimate growth rates, standardising species names across the dataset and within stems across censuses is recommended if you intend to estimate biomass (e.g. to estimate productivity) or if you intend to aggregate growth rates by taxonomic group.

The species_name_clean column in the stems table provides a species name that has been standardised against the SEOSAW taxonomic backbone (African Plant Database + extras).

It is normally a fair assumption that species names become more refined with each successive census. To ensure that species names are consistent for a given stem among censuses, the seosawr::speciesFill() function will backfill species names for each stem based on the most recent recorded species name.

# Backfill species names
s_a$species_fill <- speciesFill(s_a,
  species = "species_name_clean")

8 Finishing off

Remember to save the cleaned stem data to a file so it can be used in other parts of the workshop.

write.csv(s_a, "~/Desktop/stems_ABG_clean.csv", row.names = FALSE)