Compiling a single master.tex from a modular document

2020-04-25

I’ve recently submitted a paper to a peer reviewed journal. During the final proofing process it was clear that the editors at the journal had taken my original paper.tex file, which calls a load of other .tex files using \input{}, as well as a .bib file with \bibliography{}, and subbed in the contents of those files to form one .tex file which could then be compiled on its own, without the additional files originally called by my paper.tex.

It seemed like this process could be automated somewhat, so I set about writing a bash script.

#!/usr/bin/env bash

# Run latexmk
latexmk $1

# Get filename without extension
base="${1%.*}"

# Create main.tex
cp "${base}.tex" "main.tex"

# Create \input{} filepaths array
inputs=($(grep -E '\\input{.*}' $1 | sed 's/.*{\([^]]*\)}.*/\1/g'))

# Create \input{} line number array
inputlines=($(grep -n '\\input{.*}' $1 | cut -f1 -d: ))

# Loop over each element of both array to create a sed statement
# which replaces the \input{} lines with the contents of the referenced file
for ((i=0;i<${#inputlines[@]};++i)); do
   printf -v s '%s%s\n%s\n' "$s" "${inputlines[i]}r ${inputs[i]}" "${inputlines[i]}d"
done

# Run sed 
sed -i.bak -e "$s" main.tex

# Find bibliography line number
bibline="$(grep -n '\\bibliography{.*}' main.tex | cut -f1 -d: )"

# Replace \bibliography{} with formatted contents of
# .bbl file generated by latexmk
sed -i.bak -e "${bibline}r ${base}.bbl" -e "${bibline}d"  main.tex

# Clean up intermediate files
rm main.tex.bak
latexmk -C

First the script runs latexmk to compile the intermediate files and a .pdf. Then the script creates two array variables, one containing the filepaths of every file that is called by \input{} and the other containing the line numbers in paper.tex of each of those \input{} lines. Then the script runs a for loop which creates a big long multi-part sed statement, which replaces the lines with input{} with the contents of that file. As a side note, this is the first time I’ve really used the sed r operator, and it looks super useful. Finally the script does similar but without a for loop to replace the bibliography command with the contents of paper.bbl, which is a TeX formatted file containing the bibliography, generated by latexmk from the .bib file. After that there is just some clean up to remove intermediate files and you’re left with main.tex, which can then be sent off to reviewers or used for final proofing before publication, without the hassle of handling multiple documents. While multiple documents is useful during document creation, I feel it’s less useful when you are trying to typeset the final document.

I got a lot of help and inspiration from the following StackExchange posts:

And from this question which I asked myself to help figure out the for loop bit: