I have a master .bib
file called lib.bib
, which contains all the BibTeX citations I’ve ever used. I use it as an index for organising which papers I’ve read and it means it’s easier to reuse references for new manuscripts. When I submit a manuscript to a journal however, I want to include a standalone .bib
file which only contains the references for the current manuscript. I wrote a little shell script which leverages ripgrep (rg)
to create a sorted and filtered .bib
file:
#!/usr/bin/env sh
# $1 = .tex file to find citations
# $2 = .bib file to grab entries
match=($(rg -o '\\cite.*?\{.*?\}' $1 | sed -E "s/\\\\cite.*?\{(.*)\}/\1/g" | sed 's/,\s\+/\n/g' | sort | uniq))
for i in "${match[@]}"; do
rg -N --color never --multiline --multiline-dotall "\{$i.*?^\}" $2
done
The script takes two arguments, the first is a .tex
file which is searched to find all instances or \citep{.*}
, \citealt{.*}
etc., the second is a .bib
file which is used to grab the references found in the .tex
file.
Going line by line, first I create an array variable which contains all the BibTeX reference keys from the .tex
file. It sorts these entries alphabetically and removes duplicates to create a tidy list. Next I loop over the array variable and search for each BibTeX entry in the .bib
file, which is then printed to stdout
so the user can do what they want with it, usually send to a new .bib
file.