Writing R package documentation

2020-05-30

There’s already a tonne of stuff on how to write R packages, see here , here , here and here . Part of the reason for the breadth of articles is that there are many different workflows for how to write them. Here I’m only going to share my thoughts on writing package documentation, because that’s the area where I didn’t find one complete resource that answered all of my questions and provided a workflow I liked, when I was writing my first serious package.

To briefly explain the basic structure of my package, I took advice from Hadley and kept functions in my package inside thematic files, like biomass.R and taxonomy.R, with each of these files holding multiple functions. It’s somewhere between keeping all functions in one file and keeping each function in its own file. I think both of these extremes ignore the natural sorting which can come from keeping a tidy directory structure. I found it more intuitive to find a particular function based on its theme when I used these thematic files.

I used roxygen2 to store the documentation for each package function alongside the code for that function in my R/*.R files. For example, my convenience function for concatenating genus and species names to one string (picked as an example purely because it’s short):

#' Combine genus and species character vectors to a species name
#'
#' @param x vector of genus names
#' @param y corresponding vector of species names
#'
#' @return vector of genus and species
#'
#' @export
#'
combineSpecies <- function(x, y) {
  vec <- speciesFormat(data.frame(genus = x, species = y))
  vec <- paste(vec[[1]], vec[[2]])
  return(vec)
}

This function has the @export tag, meaning that when my package is loaded with library() by a user, this function can be accessed without prefixing with the package name. Functions with @export are automatically written into the package manual when I compile it with devtools::document(). I have a tonne of functions in this package that are not useful to the average user however, mostly functions which check the contents of a particular column in the standardised datasets used by this package. These functions are purposely not written into the package manual with the @noRd tag, which stops a .Rd file being written for that function and therefore keeps it out of the manual. These functions also have the @keywords internal, which means that the function can only be accessed by the user with package:::function(), but can still be accessed by other functions in the package with function(). This means that the user can still use the function if they need to, but are discouraged from doing so, normally because that function is better implemented in a higher-level wrapper function which provides checks or preprocessing. As an example, my function genus() checked whether genus names are formatted sensibly, but is only meant to be called from within colValCheck(), which wraps a bunch of column checking functions in a neater interface:

#' Check validity of stem genus column
#'
#' @param x vector of stem genera
#'
#' @return vector of class "character"
#' @keywords internal
#' @noRd
#'
genus <- function(x, ...) {
  x <- fillNA(x)
  x <- coerce_catch(x, as.character, ...)
  na_catch(x, warn = TRUE, ...)
  if (any(!grepl("^[[:alpha:]]+$", x[!is.na(x)]))) {
    stop("Non-letter characters found in genus")
  }
  else if (any(!grepl("^[A-Z]", x[!is.na(x)]))) {
    stop("Genera must start with a capital letter [A-Z]")
  }
  else if (any(grepl("[A-Z]", substring(x[!is.na(x)], 2)))) {
    stop("Genera must not have multiple capital letters")
  }
  structure(x, class = "character")
}

It’s nice to have a package level description at the start of a package manual before launching into the technicalities of the function definitions. To do this, I added a roxygen entry like the one below (cut for brevity), which has the object NULL and uses the key tags: @docType package and @name packagename-package.

#' silvR: Clean and analyse SEOSAW style data
#'
#' The \code{silvr} package facilitates three important activities:
#' \itemize{
#'   \item{Checking and cleaning new data for the SEOSAW dataset}
#'   \item{Manipulating the SEOSAW dataset to provide informative summary data}
#'   \item{Analysing the SEOSAW dataset}
#' }
#'
#' @details The functions in the \code{silvr} package form a workflow for 
#'     checking data prior to ingestion into the SEOSAW database. The package 
#'     deals with 4 principle data objects:
#'     
#' 	   ...
#'  
#'     The package contains various functions for quickly creating useful 
#'     summary data objects such as abundance matrices and maps, ...
#'
#' @author The \code{silvr} package is a collaborative effort, bringing code 
#'     together from various SEOSAW members ...
#'
#' @section Key top-level functions:
#' For ingesting new data into the SEOSAW database, it is recommended to run 
#' these top level functions in this order to catch errors.
#' 
#' \itemize{
#'   \item{\code{plotTableGen()} - Checks for value and column errors and 
#'   return a clean SEOSAW style plot metadata dataframe.}
#'   \item{\code{stemTableGen()} - Checks for value and column errors and 
#'   return a clean SEOSAW style stem data dataframe.}
#'   \item{...}
#' }
#'
#' @docType package
#' @name silvr-package
NULL

This longer description takes advantage of @section and @details for structuring blocks of text in the ‘roxygen2 block.

The manual frontmatter comes mostly from the DESCRIPTION file. Most important are the package dependencies, which are also specified by minimum version number. Annoyingly, these package versions don’t get populated directly from the package dependencies in the roxygen2 function blocks. Instead they have to be written manually into the DESCRIPTION.

Roxygen2 autopopulates NAMESPACE from the @import and @importFrom tags in the function blocks. I tend to use @importFrom vegan diversity rather than @import vegan where I can, to avoid potential conflicts in function names if I start loading lots of packages, but I don’t think there is any hard rule on this.

To write a vignette, I used RMarkdown rather than Sweave. It seems to be the modern approach to vignette writing and is much more straightforward when including figures and code chunks in the document. To set this up I created a directory in the package root call vignettes/ and created a packagename.Rmd file in there. Then in the YAML frontmatter I included this:

output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Cleaning and analysing SEOSAW data}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}  

Then in my DESCRIPTION I added:

Suggests: 
	knitr (>= 1.28), 
	rmarkdown (>= 2.1)
VignetteBuilder: knitr

Which ensures the tools for building the vignette are present. I can then build the vignette with: devtools::build_vignettes().

Finally, a short R script I have sitting above my package directory contains this code to build the package:

setwd("silvr")
devtools::document()  # Generate .Rd files
devtools::build_manual()  # Generate .pdf manual
devtools::build_vignettes()  # Generate .html vignette
setwd("..")
devtools::install("silvr")  # Install the package
library(silvr)  # Load the package