I’ve experimented in the past with automating the building of scientific reports, mostly just using shell scripts which run every script and use pdflatex to compile the final document, in order of dependencies, but after I finished writing my most recent manuscript I vowed to learn how to use
make is a good way to handle dependencies during software compilation, but I figured it can probably be used to compile scientific research as well. A nice feature is that it will only run parts of the compilation which need to be run every time, as long as the
Makefile is set up properly to recognise dependencies.
Unfortunately a lot of the online tutorials for using
make rely on examples using C code, which isn’t something I’m familiar with, and besides, my use case is slightly different. I did find a couple of useful resources, namely This blog post by Rob J Hyndman
and a more philosophical blog post by Zachary M. Jones
. Also this blog post
which offered some inspiration on compiling large documents like a whole PhD thesis.
I found that the best way to make an efficient and fool-proof
Makefile was to modularise the pieces of the puzzle as much as possible. This meant splitting up R scripts so that each script only creates a single plot or table of the same name, and putting those scripts in directories grouped by how they will be parsed by the
Makefile, e.g. all tables in a directory called
tab/. Although I didn’t need to do it in my example instance, modularising TeX code might be useful as well if I’m working on a big document.
For my example, as a reminder to myself of best practices, I made a directory with some example files in it, in a directory tree like this:
. ├── Makefile ├── agsmnourl.bst ├── analysis │ ├── fig │ │ ├── fig_1.R │ │ └── fig_2.R │ └── tab │ └── tab_1.R ├── fig ├── tab ├── test.bib └── test.tex
At the top level I have the
test.bib which have the text and references for my report, respectively. Below that I have a directory containing R scripts called
analysis with subdirectories grouped by what the output type of the R script is. I also have currently empty directories which will eventually hold Figures (
fig) and LaTeX formatted tables (
tab) after the
Makefile has run.
This is the
Makefile I came up with:
# LaTeX Makefile # Basic TeX file prefix PROJ = test # R input paths for figures and tables RIPATH = analysis/fig RTPATH = analysis/tab # Output paths for generated figures and tables IPATH = img TPATH = tab # Gather files from input paths RIFILES = $(wildcard $(RIPATH:=/*.R)) RTFILES = $(wildcard $(RTPATH:=/*.R)) # Create paths of output .pdf files by changing suffix from .R to .pdf # and prefix from `analysis` (RPATH) to `img` (IPATH) # These files don't exist yet but the list of files in FIGS is needed # as a dependency for $(PROJ).pdf FIGS = $(subst $(RIPATH), $(IPATH), $(RIFILES:.R=.pdf)) # Create paths of output table files by changing suffixes and prefixes, # same as above TABS = $(subst $(RTPATH), $(TPATH), $(RTFILES:.R=.tex)) # Main all: $(PROJ).pdf # Create pdf $(PROJ).pdf: $(PROJ).tex $(FIGS) $(TABS) latexmk -pdf -quiet -bibtex $(PROJ).tex # Create figures $(IPATH)/%.pdf: $(RIPATH)/%.R Rscript $< # Create tables $(TPATH)/%.tex: $(RTPATH)/%.R Rscript $< # Remove generated latex files and generated figures and tables clean: latexmk -C rm -f $(IPATH)/*.pdf rm -f $(TPATH)/*.tex
PROJ = test defines the prefix for the TeX document which will be compiled.
RIPATH = analysis/fig and
RTPATH = analysis/tab define paths where R scripts are located. The reason these two are split up into different variables is that I will use suffix replacement to create
.tex files with the same names as the scripts later on, in two different target-dependency operations.
IPATH = img and
TPATH = tab define ouput paths where the generated images and tables will be stored.
RIFILES = $(wildcard $(RIPATH:=/*.R)) and
RTFILES = $(wildcard $(RTPATH:=/*.R)) create variables holding the names of R scripts which will be transformed later to variables holding paths to files which don’t yet exist as they haven’t been created, but are necessary to give as dependencies to the
FIGS = $(subst $(RIPATH), $(IPATH), $(RIFILES:.R=.pdf)) and
TABS = $(subst $(RTPATH), $(TPATH), $(RTFILES:.R=.tex)) Transform the paths generated in the previous lines (
./tab/*.tex paths, respectively.
all: $(PROJ).pdf is the top level dependency for the
Makefile, this is the goal of the
make process, to generate this file. Note that
$(PROJ) will be expanded to
$(PROJ).pdf: $(PROJ).tex $(FIGS) $(TABS) defines the dependencies for
test.pdf which are
test.tex, all the files in
$(TABS). The command for that target is
latexmk -pdf -quiet -bibtex $(PROJ).tex, which uses
latexmk to run pdflatex and bibtex enough times to solve all cross-references and create a full
One thing that took me a long time to get my head around is that it doesn’t matter what the file name is in the target section of the
Makefile, the output of the command doesn’t have to be equal to the file given in the target as the command never sees the target. It’s only a convention for making coherent
Makefiles and to mae sure that missing files trigger the
$(IPATH)/%.pdf: $(RIPATH)/%.R and the similar line to create tables takes the file list in
$(RTPATH) and uses
Rscript $< to run each script in turn, using the
$< operator, which expands to the file given in the dependencies section of the target definition. Note the use of
% to substitute
.tex. This is why it was important to create two lists of files, one for pdf images and one for tex tables.
clean: and its commands wipes the slate clean when
make clean is called from the terminal. It removes all files generated by
latexmk and removes all the