Paper SSH & GPG key backups

May 20, 2015

As mentioned in my previous post A Break, I’ve recently switched to using git-annex for (some of) my backups, but more significantly for this post, I’m using gcrypt for all of my backups’ online remotes (i.e. those in git-annex and those in plain old Git). This post explains my reasons for encrypting online backups, summarises the benefits of this new set-up, explains why the change motivates an improved backup strategy for my cryptographic keys, and, finally, describes that strategy.

First, why do I encrypt my online backups? Well, they contain my credentials/passwords database, plus my accounts and tax records etc. Those are – I imagine – most, if not all, of the things someone would need to steal my identity or at the very least cause me a lot of inconvenience.

At a very high level there are two ways by which an attacker might obtain access. The first and most obvious is by breaking and entering my backups VPS while I am its tenant. The second would be by becoming the next tenant of a  VPS I have used for backups and then recovering my data from the disk. I recognise that becoming the next tenant – if left to chance – is improbable, and that decent VPS providers scrub disks before allocating them to new tenants, but guarding against this case allows me peace of mind if ever I choose to use a cheaper VPS provider of unknown quality for additional redundancy.

So, what are the benefits of this new set-up? It’s actually easier to describe the problems I had with previous setups and then simply point that out using git(-annex) with a gcrypt remote eliminates them, so I’ll do that:

For my online backups (located on a VPS) I used to just use plain Git repos sitting on a LUKS filesystem (later I switched to an encfs filesystem, and later still to an encfs over SSHFS filesystem; in all cases I was using keys derived from a password). LUKS was less than ideal in that it necessitated partitioning the disk of whatever VPS I was using, and that inevitably meant being bitten later by incorrect estimation of what partition sizes I needed. Hence the later switch to encfs. That solved the partition size estimation problem. It didn’t solve the other problem I had with LUKS, which was the need for me to mount and unmount the encrypted filesystem on the VPS, which firstly is just a pain to do, and secondly weakens the protection in the case of an attacker breaking and entering the VPS, as every so often the key is memory and the plaintext filesystem is mounted. Both did however protect against the case of a subsequent disk tenant recovering data (though it’s best practice to shred the key before moving out of the VPS, and that’s something one can be more certain of having achieved with LUKS than with encfs, if the encfs is on a journalled filesystem as it was in my case). Finally, as the terminal evolution of the encrypted filesystem approach, I adapted the encfs solution so that I stored the ciphertext encfs directy on the VPS but then mounted it on my laptop over SSHFS before mounting that with encfs to expose the plaintext view only my laptop. This improved security in that neither the plaintext filesystem nor its plaintext key were ever exposed on the VPS (though the encrypted encfs key was of course permanently present), but it had one practical problem that I imagine would be difficult for most to foresee (it was for me): performance. It was painfully slow. I did some reading and came to understand this was an unavoidable result of using SSHFS, but I can’t remember any more detail than that.

Having finally met all of my security goals with the encfs over SSHFS approach, what I now wanted was to maintain that level of security but regain the performance I’d enjoyed with the LUKS and encfs approaches. Enter gcrypt, as one of two means by which I ensure that all data I backup is already encrypted before it reaches the filesystem*. For my needs the performance with gcrypt remotes is equivalent to LUKS/encfs, while the security is better than my encfs over SSHFS approach because neither the plaintext repo nor its plaintext key are ever exposed on the VPS; though the shared key used for the symmetric encryption of the actual data is stored on the VPS (see the description of hybrid encryption here), it is encrypted using my GPG key rather than a passphrase as encfs’s symmetric key had been.

Now, with the motivation and results of the changes to my data backup strategies described we can get to how this motivates improved cryptographic key backups.

Until I made the switch to this approach the only thing I needed in order to restore data from backups was the passphrase used for LUKS/encfs key derivation, which I memorised. That meant that I could store my SSH key within the backups (I didn’t have a GPG key) and that would have been sufficient to recover from loss of everything except one (offline) copy of my backups (recovery from only an online copy would not have been possible because loss of all local data would mean I’d have lost my SSH key – which I would have required in order to access any VPS hosting the online backups). One might then question the usefulness of online backups if they are dependent on having an offline backup. My answer to that is that online backups are more convenient, so I push to them every time I commit changes to my data**, whereas offline backups are less convenient (and I’d argue that there’s a greater risk of accidental loss if the medium is too close to hand).

Having switched to the gcrypt approach described above I do now have a GPG key, which is of course required for me access any of my online backups. Naturally I still require my SSH key to first of all access the VPS, so both are essential if I am to recover data from my online backups. Now, I could have kept both the SSH key and GPG within the backups and I’d have been no worse off than before (in that recovery from only an online copy would not have been possible because loss of all local data would mean I’d have lost my SSH and GPG keys). Of course, I didn’t choose to keep only flash copies of my keys because if I had done the title of this post wouldn’t be Paper SSH & GPG key backups. I chose to create paper backups of my SSH and GPG keys because by doing so I can treat them like Horcruxes, keeping them less close to hand than I do my actual offline data backups and therefore hopefully minimising the risk of accidental loss. The key point (pun really not intended) is that my keys should change a lot less often than my data, so their being less close to hand is not really any inconvenience***.

With all that background given, I’m now are the core of the post – a description of the approach I’ve settled on for backing up my cryptographic keys.

I wanted to use paper – as opposed to flash drives – for my key backups because:

So what exactly is my approach?

For SSH what I backup is the raw content of ~/.ssh/id_rsa  – mine is already password protected (GNOME Keyring takes care of automatically loading it for me).

-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: AES-128-CBC,initialvectorinitialvector

base64database64database64database64data...
-----END RSA PRIVATE KEY-----

For GPG what I backup is the output of gpg --export-secret-keys -a  – again, my GPG Key is password protected and that protection does remain on the exported output:

-----BEGIN PGP PRIVATE KEY BLOCK-----
Version: GnuPG v1

base64database64database64database64data...
-----END PGP PRIVATE KEY BLOCK-----

I initially thought I would print just the text in a font suitable for OCR (e.g. OCR-A), but I did a test recovery of a printed copy using Tesseract and found that it failed to recognise the em-dash/en-dash/hyphens correctly. Although to be fair I’ve just shown that the difference between the three isn’t obvious to me. I recognised that I could have corrected such errors manually on recovery, but the failure added to my existing reservation  on the grounds that OCR really requires a scanner (I have one now, but will I have easy access to one in a worst case scenario?). So, I moved on to matrix codes.

The obvious choice was QR codes. For my SSH key it was simple. For GPG keys, less so. As per http://blog.qr4.nl/page/QR-Code-Data-Capacity.aspx the maximum binary capacity of a QR code is 2953 bytes (though I found that qrencode wouldn’t take more than 2948 – I’m not sure why), while the size of the GPG key data I wanted to encode was 3573 bytes:

gpg --export-secret-keys -a | wc -c
3573

Hence the need to either shrink the GPG key data for backup to 2948 bytes or fewer, or to split it across several matrix codes. I knew from the start that I’m not the first person to think of using matrix codes for key backups, and so when I encountered this problem I was easily able to find one solution in the form of Paperkey. I chose not to try Paperkey because I have not published the public part of my GPG key anywhere and so do want a backup of it, and because I’d rather keep creation and restoration as simple as possible – Paperkey would be an unnecessary complexity for me. I’m acknowledging Paperkey here because from what I can tell it is a popular solution, and if you’re reading this but are not aware of Paperkey you might want to give it a go instead of following my lead.

Having confirmed my desire to backup both parts of my GPG key and the resulting need to split it across matrix codes, I’ll now describe how I’m doing that.

Putting aside the maximum number of bytes that could be written to a QR code, the maximum number of bytes that could easily be read from the QR codes when printed on A4 paper and read from the paper while held in front of my laptop’s webcam was an important bound too, if the key backups were to be of any use. Through a small amount of trial, error and improvement I found that 2048 bytes could just about be read directly from webcam by zbarcam, but strangely enough they couldn’t be read from scanned images (which are surely of higher quality) by zbarimg. It just failed to identify the QR codes in the images, and I didn’t investigate further.

This would have been unfortunate in that QR codes were looking more promising than Data Matrix codes, but then they turned out not to be anyway. qrencode – the canonical tool for encoding QR codes – supports structured append, which automates the process of splitting data into up to 16 QR codes (so around 16 * 3000 bytes; more than enough). However, I found that two tools provided by ZBar mentioned above do not support structured append, and though the CLI of the Java SE version of ZXing was able to at least output the individual parts for me to stitch back together, ZXing’s CLI is nowhere near as simple to use as ZBar (scan/capture image of QR code, download the two required Jars and then invoke Java with the right classpath, main class and args, vs. apt-get install zbar-tools, run zbarcam, hold QR code in front of webcam).

At this point I turned to Data Matrix codes (which have similar per-code capacity and include support for structured append in the latest standard) in the hope that the tools would be better for my needs. They are. Though I did again run into a problem in that neither the read nor write commands of the canonical tool package appear to have support (I’m referring to the dmtxwrite and dmtxread commands provided by the dmtx-utils package).

So the deciding factor in the end was the tooling, not the standards; dmtxread is able to successfully decode key parts from scanned images, whereas zbarimg doesn’t seem able. And since neither the QR nor Data Matrix tooling has complete support for structured appends, I’m just manually splitting the parts into chunks of 1556 bytes (or fewer) and then creating standalone Data Matrix codes for each part (the maximum binary capacity of a single Data Matrix code is 1556 bytes).

To actually create my backups, I do the following in a temporary directory.

Copying my SSH known hosts into the directory:

cp ~/.ssh/known_hosts known_hosts.txt

Splitting a copy of my SSH key into parts in the directory:

split -a 1 --additional-suffix=.txt -b 1556 -d ~/.ssh/id_rsa id_rsa.

Exporting and splitting my GPG key into parts in the directory:

gpg --export-secret-keys -a | split -a 1 --additional-suffix=.txt -b 1556 -d - secring.gpg.

Printing the key part texts (really just as cover sheets for the matrix codes so it’s clearer which part is which and what the concatenation order should be for restores, but theoretically  these can also be last resort copies in case a Data Matrix code is somehow unreadable):

a2ps -4 -A fill known_hosts.txt id_rsa.*.txt; a2ps -4 -A fill secring.gpg.*.txt

Creating Data Matrix code PNGs of all key parts:

find . -name "*.txt*" -exec dmtxwrite -o {}.png {} \;

Printing the key part PNGs (This just opens GNOME’s Image Viewer – you’ll then need to request that it prints them. I did try a several ways of printing the matrix codes directly from Bash (having dmtxwrite output a PDF and passing that to lpr, but also using ImageMagick’s convert command to convert PNGs to PostScript and then passing those to lpr, and finally using a2ps to do the equivalent [it just delegates to ImageMagick]), but all resulted in the bottom edge of the matrix codes being cropped by my printer – probably something specific to my machine, but anyway – that’s why I settled on Eye of GNOME, even if it is less slick):

eog *.txt.png

Restoring a key is as simple as scanning the Data Matrix codes, running each image through dtmxread, and then concatenating them all together. The cover sheets allow ordering to be confirmed manually. I found adding the --shrink=2 option essential – it just didn’t want to recognise anything when run on full resolution scans. I didn’t investigate further because at this point I’ve spent more time than I wanted to on the task, and I do at least now have an adequate solution.

*My versioned files (i.e. everything important that lives on my laptop) go in Git repos (some with annexes), and the only other important data is that from my email and web servers, and that is backed up by Duplicity jobs which encrypt the output files in situ prior to them then being pulled to the same VPS that holds my versioned data backups.

**And I have cron jobs initiate the pulling of backups from my web and email servers.

***Of course, I will need to access (destroy and replace) the keys periodically because the key data encoded on paper is that of the password encrypted keys. Naturally I change my passwords from time to time and when I do I’ll need to destroy and replace my paper key backups.