Sort and filter .bib files

2020-08-24

I have a master .bib file called lib.bib, which contains all the BibTeX citations I’ve ever used. I use it as an index for organising which papers I’ve read and it means it’s easier to reuse references for new manuscripts. When I submit a manuscript to a journal however, I want to include a standalone .bib file which only contains the references for the current manuscript. I wrote a little shell script which leverages ripgrep (rg) to create a sorted and filtered .bib file:

#!/usr/bin/env sh

# $1 = .tex file to find citations 
# $2 = .bib file to grab entries

match=($(rg -o '\\cite.*?\{.*?\}' $1 | sed -E "s/\\\\cite.*?\{(.*)\}/\1/g" | sed 's/,\s\+/\n/g' | sort | uniq))

for i in "${match[@]}"; do 
	rg -N --color never --multiline --multiline-dotall "\{$i.*?^\}" $2
done 

The script takes two arguments, the first is a .tex file which is searched to find all instances or \citep{.*}, \citealt{.*} etc., the second is a .bib file which is used to grab the references found in the .tex file.

Going line by line, first I create an array variable which contains all the BibTeX reference keys from the .tex file. It sorts these entries alphabetically and removes duplicates to create a tidy list. Next I loop over the array variable and search for each BibTeX entry in the .bib file, which is then printed to stdout so the user can do what they want with it, usually send to a new .bib file.