There are lots of people I would like to follow on Instagram, mostly woodworkers, bicycle people, and outdoors people. It seems to be a really good method of delivering content. Unfortunately for Instagram, there is absolutely no way I would make an account with them. I fear it would be too much of a time sink, and I’m paranoid of giving too much detail of my personal interests to Facebook.
I found a command line tool called InstaLooter which you can use to scrape public Instagram profiles without an account and save the images on my local machine which I can then read at my leisure, in the spirit of RSS. This is how I implemented the program.
I created a text file which lives in my $HOME
called .ig_subs.txt
. The file holds a list of Instagram user IDs for the accounts I want to scrape from:
kelsoparadiso
lloyd.kahn
exploringalternatives
barnthespoon
terrybarentsen
woodlands.co.uk
zedoutdoors
mossy_bottom
Then I made a shell script which lives in my path, called insta_dl
:
#!/bin/bash
# Make directory if it doesn't exist
mkdir -p $HOME/Downloads/ig
# make newlines the only separator
IFS=$'\n'
# disable globbing
set -f
# Loop
for i in $(cat < "$HOME/.ig_subs.txt"); do
instalooter user $i $HOME/Downloads/ig/ -n 1 -N -T {username}.{date}.{id}
done
instalooter user $i
downloads photos from each user i
. -n 1
only downloads the most recent post, whether that post is one photo or multiple. -N
only downloads images which don’t already exist in the destination directory ($HOME/Downloads/ig/
), based on the filename. -T {username}.{date}.{id}
sets the filename of each photo. {id}
is unique for each photo on Instagram, so it uniquely identifies each file downloaded for use by -N
. The filenames then look something like this:
exploringalternatives.2019-09-27.2142383070393557093.jpg
kelsoparadiso.2019-10-09.2150831532411304437.jpg
kelsoparadiso.2019-10-09.2150831532419588103.jpg
kelsoparadiso.2019-10-09.2150831532419839765.jpg
lloyd.kahn.2019-10-11.2152638264107259024.jpg
mossy_bottom.2019-10-09.2151026330651686709.jpg
terrybarentsen.2019-10-03.2146722625883638769.jpg
terrybarentsen.2019-10-03.2146722625900303797.jpg
terrybarentsen.2019-10-03.2146722625950630270.jpg
woodlands.co.uk.2019-10-11.2152273592812162360.jpg
zedoutdoors.2019-10-02.2145942922787735607.jpg
If I wanted to I guess I could further file each image into its own directory based on username or date, but I don’t want that.
I can now create a cronjob or a LaunchAgents script to automate this to run everyday or every week in the background.
Update - 2019_10_31
I updated the insta_dl
shell script so that it also grabs the caption of each instagram post downloaded and stores it in a text file. InstaLooter can download post metadata as a JSON file by adding the -d
flag (--dump-json
). Then I use jq
to parse the JSON file for each post to extract the full name of the account (.owner.full_name
), the @username of the account (.owner.username
) and the content of the caption of the post (.edge_media_to_caption[][][].text
). Then I use sed to put a blank line between each caption to make it easier to read and delete the original JSON files:
#!/bin/bash
# Make directory if it doesn't exist
mkdir -p $HOME/Downloads/ig
DIR=$HOME/Downloads/ig
# make newlines the only separator
IFS=$'\n'
# Loop
for i in $(cat < "$HOME/.ig_subs.txt"); do
instalooter user $i $DIR -v -d -n 1 -N -T {username}.{date}.{id}
done
for i in $DIR/*json ; do
cat $i | jq '(.owner.full_name + " (" + .owner.username + "): " + .edge_media_to_caption[][][].text)'
done > $DIR/description.txt
sed -i 'G' $DIR/description.txt
rm $HOME/Downloads/ig/*.json