This worksheet provides examples of how to prepare SESOAW repeat stem measurement data to measure carbon dynamics. The code chunks are all written in R. The example dataset used in this worksheet (“stems_ABG.csv”) can be found in the workshop Sharepoint folder.
First, load the required packages. If you haven’t installed the packages you
should do so with install.packages("<pkg>")
, where <pkg>
is the name of the
package.
library(dplyr)
library(ggplot2)
library(seosawr)
# source("rscripts/seosawr_fn.R")
library(ProdVital)
# source("rscripts/ProdVital_fn.R")
Next, load the data. Either use your own data, or one of the example datasets provided for the workshop.
s <- read.csv("seosaw_data/stems_ABG.csv")
p <- read.csv("seosaw_data/plots_ABG.csv")
Each row in the stems table is one measurement of one stem in one census. Each row in the plots table is one census of one plot. You can find more information about the columns in each of these tables by reading the “plot_columns.csv” and “stem_columns.csv” files in the Sharepoint folder.
An example stems table:
An example plots table:
Sometimes, a stem will appear to die only to resurrect at a later date. Stems might also be missing stem status values. It is important when measuring rates of recruitment and mortality to have consistent stem mortality timelines.
Figure 2.1: An example stem mortality timeline. Subscript ‘d’ indicates a diameter measurement. Here, two datapoints are missing. The first is interpolated as alive, as there are alive measurements before and after. The second is imputed as dead, because it is followed by a dead measurement and has no diameter measurement.
We can try to fix these issues using the seosawr::statusImputGen()
function.
# Impute mortality status
mor <- statusImputGen(s)
# Add new columns to stem data
s_mor <- bind_cols(s, mor)
mor
has three columns, the corrected vital status (stem_status_est
),
and a column which records if and why the vital status has been changed
(chg
).
Now that the stem status values have been fixed, we should remove measurements
of dead stems. The functions in {ProdVital}
assume that the stem data only
contain measurements from living stems. Measurements from dead stems should be
excluded from the stem data.
It’s up to you whether you count resprouting stems (status = “r”, stem dead above point of measurement but with living material elsewhere on the tree) as alive or dead.
s_a <- s_mor %>%
filter(stem_status_est == "a")
Stem diameter data may include errors introduced during data collection or data entry. These errors can occur for a number of reasons. Common errors include decimal points in the wrong place or typos which add extra digits.
A simple way to identify outlier diameter measurements is to compare them with height measurements. Stem diameter and stem height tend to have a strong positive correlation.
ggplot(s_a, aes(x = diam, y = height)) +
geom_point()
## Warning: Removed 7926 rows containing missing values (`geom_point()`).
To formally identify outliers, you could fit a regression model of diameter vs. height and flag measurements which fall outside a given number of standard deviations from the regression fit. The relationship between diameter and height is not always linear, but close enough most of the time.
The seosawr::diamHeightCheck()
function can be used to identify outliers. The
function fits a linear model to diameter-height data, and returns a dataframe
with plot IDs, stem IDs and census dates for measurements identified as
outliers.
# Identify outlier diameter measurements
dflag <- diamHeightCheck(s_a)[[1]]
# Join outliers to data
dflag$dflag <- TRUE
s_d <- s_a %>%
left_join(., dflag, by = c("plot_id", "stem_id", "census_date")) %>%
mutate(dflag = ifelse(is.na(dflag), FALSE, dflag))
# Plot data, highlight outliers
ggplot(s_d, aes(x = diam, y = height)) +
geom_point(aes(colour = dflag)) +
geom_smooth(method = "lm", formula = y ~ x)
## Warning: Removed 7926 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 7926 rows containing missing values (`geom_point()`).
Remember though, short fat stems can also be caused by stems breaking. Similarly, tall skinny stems might not always constitute an error.
If you don’t have height measurements, you could start by looking at the distribution of diameter measurements. Very high values might be outliers and so are worth double checking.
ggplot(s_a, aes(x = diam)) +
geom_histogram(colour = "black", binwidth = 5, boundary = 0)
## Warning: Removed 334 rows containing non-finite values (`stat_bin()`).
s_a %>%
slice_max(diam, n = 20) %>%
dplyr::select(plot_id, stem_id, census_date, diam)
## plot_id stem_id census_date diam
## 1 ABG_5 31 2019 88.0
## 2 ABG_10 202 2021 87.0
## 3 ABG_10 202 2019 86.3
## 4 ABG_5 31 2021 85.2
## 5 ABG_4 2 2021 85.1
## 6 ABG_7 869 2019 83.1
## 7 ABG_4 2 2018 83.0
## 8 ABG_7 869 2021 83.0
## 9 ABG_10 175 2021 82.3
## 10 ABG_10 175 2019 82.0
## 11 ABG_12 64 2021 81.9
## 12 ABG_12 185 2019 81.2
## 13 ABG_12 64 2019 81.1
## 14 ABG_5 41 2021 78.2
## 15 ABG_5 41 2019 77.4
## 16 ABG_10 174 2019 75.5
## 17 ABG_15 26 2021 72.0
## 18 ABG_4 319 2018 71.8
## 19 ABG_9 262 2019 71.6
## 20 ABG_9 262 2021 71.5
## 21 ABG_15 26 2019 71.5
s_a$min_diam_thresh <- p$min_diam_thresh[match(s_a$plot_id, p$plot_id)]
which(s_a$diam < s_a$min_diam_thresh)
## [1] 363 406 421 423 424 456 463 847 1011 1518 1666 3697 3760 4708 4984 8492
## [17] 9371 12351 13345 13606 13668 13697 13700 14026 14098 14908 15150
Values which are below the minimum diameter threshold of the plot (column
min_diam_thresh
in plots table) might also be errors.
Comparing stem size between censuses can also identify outliers. If a stem is growing at a particular rate, then suddenly grows much faster, this might indicate an error.
Similarly, if a stem appears to be growing at a much faster rate than other stems in the plot or within the same species, this might indicate an error.
Finally, if a stem suddenly appears to get much smaller, this could indicate an error. Small reductions in diameter do occur naturally however, either caused by changes in water availability, random measurement error, or where stems are near death and begin to dry out and lose their bark. Also remember that where there are only two censuses, it’s difficult to tell whether the error comes from the first measurement or the second measurement.
The seosawr::diamChangeCheck()
function can identify stems where the annual
diameter increment is greater than a given annual threshold (thresh
).
# Identify outlier diameter measurements
dcflag <- diamChangeCheck(s_d, thresh = 4)[[1]]
# Join outliers to data
dcflag$dcflag <- TRUE
s_dc <- s_d %>%
left_join(., dcflag, by = c("plot_id", "stem_id")) %>%
mutate(dcflag = ifelse(is.na(dcflag), FALSE, dcflag))
# Return plot and stem IDs where growth has been flagged
s_dc %>%
filter(dcflag == TRUE) %>%
dplyr::select(plot_id, stem_id) %>%
head()
## plot_id stem_id
## 1 ABG_4 242
## 2 ABG_4 242
## 3 ABG_5 176
## 4 ABG_5 176
## 5 ABG_5 204
## 6 ABG_5 204
# Plot data, highlight outliers
ggplot(s_dc,
aes(x = census_date, y = diam)) +
geom_line(aes(alpha = dcflag, colour = dcflag,
group = interaction(plot_id, stem_id))) +
scale_alpha_manual(values = c(0.1, 1))
## Warning: Removed 334 rows containing missing values (`geom_line()`).
The Point Of Measurement (POM, column pom
in stem table) describes the height
up the stem at which the diameter was measured.
Figure 5.1: Illustration of the Point of Measurement (POM).
Figure 5.2: An example stem POM timeline. Here, two datapoints are missing. The first is imputed as the plot default POM, because the values before and after are not the same. The second is extrapolated from the previous measurement, as it comes at the end of the timeline. Finally, the 1.5 m POM measurement should not be compared with the other 1.3 m POM measurements, because this will lead to biased estimates of growth due to stem taper.
Where POM is missing, we can use the seosawr::pomImputGen()
function to fill
the gaps.
# Fill missing POM values
pom_imp <- pomImputGen(s_dc, p)
s_dc$pom_est <- pom_imp$pom_est
Due to stem tapering, when a POM is moved upwards to avoid an abnormality on the stem, it may appear that the stem diameter has become much smaller. When calculating growth rates, it is therefore important to account for changes in POM, so they do not bias measurements. The simplest way around this issue is to simply exclude census intervals where the POM changes, which can be done later after we have calculated growth rates.
Alternatively, there are many models of stem taper that have been developed,
and you could use one of these to estimate the stem diameter at a standard
height (normally 1.3 m). The seosawr::diamAdj()
function implements one of
these models of stem taper.
# Adjust diameter where POM has moved
s_dc$diam_adj <- diamAdj(s_dc, pom = "pom_est")
## Warning: POMs >1.7 m outside calibration data, no correction applied at rows
## 1007, 1008, 2341, 2342, 2343, 2344, 2345, 3296, 3297, 3351, 3352, 3555, 3556, 4052, 4053 ...
If you have plenty of data, you might choose to simply remove the stem entirely if the POM changes.
# Exclude stems where POM has moved
s_pom <- s_dc %>%
group_by(plot_id, stem_id) %>%
filter(n_distinct(pom_est) == 1) %>%
ungroup()
It’s important to check that measurements in your dataset are unique. The
functions in {ProdVital}
and {seosawr}
will fail if a stem has been
measured more than once per census. The
seosawr::measurementUniqueCheck()
function can check this.
# Flag duplicate measurements
measurementUniqueCheck(s_pom)
## $`Duplicated stem measurements`
## # A tibble: 2 × 3
## plot_id stem_id census_date
## <chr> <chr> <int>
## 1 ABG_16 700 2021
## 2 ABG_16 730 2021
The function returns a dataframe of plot IDs, stem IDs, and census dates where measurements have been duplicated. It’s then up to you to decide how to deal with these duplicates.
While it is not essential to estimate growth rates, standardising species names across the dataset and within stems across censuses is recommended if you intend to estimate biomass (e.g. to estimate productivity) or if you intend to aggregate growth rates by taxonomic group.
The species_name_clean
column in the stems table provides a species name that
has been standardised against the SEOSAW taxonomic backbone (African Plant
Database + extras).
It is normally a fair assumption that species names become more refined with
each successive census. To ensure that species names are consistent for a given
stem among censuses, the seosawr::speciesFill()
function will backfill
species names for each stem based on the most recent recorded species name.
# Backfill species names
s_a$species_fill <- speciesFill(s_a,
species = "species_name_clean")
Remember to save the cleaned stem data to a file so it can be used in other parts of the workshop.
write.csv(s_a, "~/Desktop/stems_ABG_clean.csv", row.names = FALSE)