Writing ggplot2 grobs in a loop to maintain data values

2019-10-15

Sometimes I encounter the need to make multiple plots in a loop and then arrange them in a grid.arrange() panel. A recent example was when I had a bunch of ANCOVA linear models in a list, each with a different predictor variable, and I wanted to create ggplot() objects for each linear model in a for loop, showing the distribution of points for the predictor and response variable and slopes for a categorical grouping variable, then I wanted to put each plot into a single grid image and export it, like the image below:

Plots of biomass across different vegetation clusters

The code to generate this involves creating a list for the plot objects, then filling the list with each ggplot() called within a for loop, where each iteration of the loop is a different linear model with a different predictor variable, then calling the list of plots with do.call("grid.arrange", c(plot_list)) to build the grid image ready to be exported. To access the data needed for the x axis of the plot, which changes with each plot/linear model, I took the variable name from the linear model it is based on with:

x_var <- rownames(summary(model_list[[i]])$coefficients)[2]

This means the ggplot() for the scatter plots can then be called like this:

ggplot() + 
	geom_point(aes(x = df[,x_var], y = df[,bchave_log])

You would think this works fine, but when the ggplot() object is called again outside the loop, x_var is nowhere to be found, as it only exists inside the loop. So instead I had to build each ggplot() into a ggplot grob inside the loop where x_var still exists before plotting it outside the loop. Building a grob from a ggplot() object simply requires wrapping it in ggplot_gtable(ggplot_build(...)). Below is an example with inbuilt data to illustrate the point:

# Example data
df <- mtcars

# Run a for loop to plot each column aginst column 1
plot_list <- list()

for(i in 1:length(df)){
  plot_list[[i]] <- ggplot() + geom_point(aes(df[,i], df[,1]))
}

do.call("grid.arrange", c(plot_list, ncol = 2, nrow = 6))

##' All the plots are the same because `df[,i]` takes the 
##' value from the final loop iteration.

# Fix the ggplot objects in the loop so they maintain variable values
for(i in 1:length(df)){
  plot_list[[i]] <- ggplot_gtable(ggplot_build(ggplot() + 
      geom_point(aes(df[,i], df[,1]))))
}

do.call("grid.arrange", c(plot_list, ncol = 2, nrow = 6))

First, ggplot_build() takes the plot object and produces an object that can be rendered as a standalone, with a list of dataframes for each plot layer (points, lines) and a panel object containing metadata for the plot like axis limits and themes.

Then, ggplot_gtable() builds a grob which can display the image and stores it as a gtable. This bit is necessary so that grid.arrange() can draw the plot.