Encrypted off-site backups made easy.
Picture the scene…
I provide managed Linux hosting for a number of my clients. My backup-solution-of-choice for many years had been rsnapshot-ing onto a remote VPS.
This is fine and has always served me well, but I’ve always been mindful of the following:
- It’s a little cumbersome to set up for each new VPS.
- Disk space used for the backups increases (almost) linearly1.
- As more and more VPS providers move from HDDs to SSDs, VPS disk space is not as cheap as it used to be.
- If a bad actor somehow gained access to the backup server, they would have access to backups for all the servers.
- Backups are initiated from the destination server, which makes it impossible to control backup intervals from the source server. Initiating from source would require multiple user accounts on the destination (in order to prevent servers from having access to the backups from other servers). So, basically, it’s a faff.
So, I decided to look around and see what else was available.
My vague criteria were:
- Backups should be initiated on the source server via the command-line/cron. Backups are to be pushed rather than pulled.
- The system should be as simple as possible to configure on any Linux server. (Anything that is complex to configure is easy to mess up!)
- The system should be efficient when transferring and storing text files. Most of the data I backup is source code, config files and database dumps, all of which are text files. So, some sort of block-level de-duping would be ideal.
- Backups should be encrypted at source, so that if a bad actor does somehow gain access to the backup store, the backups will all be useless to them.
- Each server’s backups should be independent, so that if a bad actor does somehow gain access to one of the primary servers then they will not be able to access backups for other servers.
- The system must comply with all appropriate EU Data Protection (and GDPR) legislation.
And the winner is… Tarsnap!
If you’re on the lookout for a backup solution for your Linux/Unix servers then I highly recommend tarsnap, especially if your a command-line person like me.
Some awesome things about Tarsnap:
- It uses block-level de-duplication for both data transfer and data storage. This is super-efficient and helps to keep costs down.
- It’s easy to set up, and it does encryption-at-source straight out of the box.
- Its command-line usage closely resembles that of tar, which is awesome if you’re old-school like me. (It also has the pleasing side effect that the backup script that you wrote 20 years ago to tar directories onto a tape drive can be easily modified to back up to cloud storage.)
Here’s how easy it is to get up and running.
Create yourself a Tarsnap account
Head on over to tarsnap.com and set yourself up with a new account.
(The web interface is a little clunky but, believe me, you don’t care because you rarely use the web interface).
Add some funds to your account
The tarsnap billing model is admittedly a little weird, in that it’s pay-as-you-go rather than pay-monthly.
So, you pay in as much as you want and any usage gets taken off those funds.
To get you started, I’d suggest you pay in $2, which is more than enough to see just how awesome this service is.
Install the tarsnap client on your Linux/Unix server(s)
Now that you have your tarsnap web account, you need to set up the tarsnap command-line client on your server.
Obviously, you’ll need to install the client on every server that you want to backup. Note, however, that it’s perfectly safe to use the same web account (i.e. billing account) for all of them, as they cannot access each other’s data.
You can find, download and install the appropriate client from the Tarsnap website.
However, if you’re in a hurry and have a recent Ubuntu server (14.04 or 16.04), just cut-and-paste this little beauty into a terminal window:
sudo bash <<"EOF"
TARSNAP_VERSION=1.0.39
apt-get update
apt-get install -y gcc libc6-dev make libssl-dev zlib1g-dev e2fslibs-dev < /dev/null
cd /tmp
if [ $(pwd) != "/tmp" ] ; then
    echo "No /tmp directory?"
    exit 1
fi
wget https://www.tarsnap.com/download/tarsnap-autoconf-${TARSNAP_VERSION}.tgz
tar zxvf tarsnap-autoconf-${TARSNAP_VERSION}.tgz
cd tarsnap-autoconf-${TARSNAP_VERSION}
./configure
make all
make install
mv /usr/local/etc/tarsnap.conf.sample /usr/local/etc/tarsnap.conf
EOF
Right, so that’s done.
By this point, you should be able to run the tarsnap command like so:
prompt> sudo tarsnap --version
tarsnap 1.0.39
Now, create your server key, and link it to your Tarsnap billing account that you created earlier (so that tarsnap.com know who to bill for the space and bandwidth used).
sudo tarsnap-keygen --keyfile /root/tarsnap.key --user $YOUR_TARSNAP_USERNAME --machine $(uname -n)
IMPORTANT!! YOU MUST TAKE A COPY OF THE GENERATED /root/tarsnap.key and keep it safe, as this keyfile is the only way to access the backups in the event of your server failing!
Right, with that done, you’re now ready to create some backups.
Creating backups
One of the things I love about tarsnap is that its command-line usage closely resembles the Unix tar command.
For example, to create a normal tar file on Unix, you might do something like this:
cd /tmp
sudo tar -cf backup_of_etc.`date +%Y%m%d.%H%M%S`.tar /etc
With tarsnap, it’s pretty much the same, but it stores the archive on its remote server, rather than on your local disk:
sudo tarsnap -cf backup_of_etc.`date +%Y%m%d.%H%M%S` /etc
To convince yourself that it has actually done something, try running:
sudo tarsnap --list-archives
Cool, right?
Restoring to the local server
As you have probably guessed by now, you can use other tar-like options to view and restore archives.
To see all available archives stored on the tarsnap server under your server key, simply run:
sudo tarsnap --list-archives
Pick an archive from the list and have a look at its contents:
sudo tarsnap -tf $NAME_OF_ARCHIVE
To restore (or “extract”), you would use -xf instead of -tf.  Obviously, BE CAREFUL when extracting backups, so that you don’t accidentally overwrite something you wanted to keep.
Accessing your backups from elsewhere
To access the backups of one of your servers from another of your servers, you simply need the tarsnap.key file from the original server (you did back it up like I told you, didn’t you?).
Once you have that, you can use any other working tarsnap command-line client to access the archives. To do that, you simply specify which key to use.
For example, to view archives for some other server, you would do:
sudo tarsnap --keyfile=/tmp/some_other_server.tarsnap.key --list-archives
Other Backup Solutions
There are obviously other backup solutions out there and it’s entirely up to you to choose the right solution for you. In some cases, it may just come down to which one fits in your head better, and in my case that was tarsnap.
Prior to going with tarsnap, I did consider a number of other options, including:
- Dropbox
- SpiderOak One
- Amazon S3
- Backblaze B2
Everything except SpiderOak fell down because they didn’t offer encryption-at-source straight out of the box.
Despite offering encryption-at-source by default, SpiderOak fell down because each server could access the other servers’ backups. (I could have got around that with multiple accounts, but that would just be an admin headache.)
Backblaze was a close runner up to tarsnap, and it does have an online HOWTO on encrypting your files at source, but again it’s extra setup which just isn’t necessary when using tarsnap.
In all cases, I could have encrypted everything before passing it to the provider but that felt like it was:
- An extra level of faff.
- Likely to reduce the impact of any de-duping carried out by provider.
So, tarsnap FTW!
Addendum - 08/05/2018
I was asked to clarify my view on the use of Tarsnap in the era of GDPR, so here goes:
The new General Data Protection Legislation has a whole raft of legislation regarding the transfer of personal data out-with the EU.
Tarsnap is a Canadian business which uses US servers to store its data, so at first glance it may appear that it does not comply.
However, the fact that the tarsnap client uses encryption-at-source is critical here. I would strongly argue that the Tarsnap servers don’t need to comply: When using the tarsnap command-line client, by the time any “personal data” leaves your server it is no longer personal data, it is simply a stream of meaningless ones and zeroes. It is mathematically impossible to tie that information back to any individual without access to the original Tarsnap encryption key. Therefore, as long as the original encryption key remains in the EU, then I would argue that you haven’t transferred any personal data out-with of the EU. The only thing you transferred was a large blob of meaningless ones and zeroes.
- 
rsnapshot uses block-level de-duplication when transferring data, but it only uses file-level de-duplication when storing data. This means that, even if your file has only changed slightly, a whole other copy will be stored. ↩︎ 
