The online home of Toby Wilkinson

## Improving on backing up with rsync

Recently my backups have been in rather a state of flux. Following the restoration of a site that I’d had cron religiously back up daily with rsync, I realised that I was accumulating rather a lot of cruft in my backups – old versions of CMS systems were present, temporary versions of images uploaded that were subsequently resized and pages put up briefly for special offers, maintainance or whatever all required pruning when I restored the site. I moved over to taking backups using rsync –delete which removed all files in the backup which are not in the live site. This was a good improvement, giving me a true snapshot of how the site was when the backup was taken. However it wasn’t long until there was an awkward conversation along the lines of “I deleted some images from our blog a few days ago, can you get them back?”

Why don’t you just check out from your own source control?

I was asked this a few times, and while it’s true that I’d love full control of what was on the sites in question, people these days expect more than static sites – they want to be able to update the content of their sites, run a blog and have external agencies work on the content of their sites. At the same time, they expect them to be backed up.

I wanted something that would provide backups with historical changes, so that I could restore sites to the way they were at a particular point in the past.

Research proved suprisingly fruitless. Many people documented how they backup with some of the more esoteric options of rsync and/or tar, giving them incremental backups, then using scripts to rebuild historical states of the files. Other people were using version control systems, such as SVN, written for maintaining source code with lengthy scripts to undo much of the work done by these systems (which generally need things like explicit instructions on the files that have come and gone since the last time the files were committed).

By chance, I finally found a comparison of backup tools on an Arch Linux page. This had several very hopeful looking bits of software. I settled on rdiff-backup as it seems mature, well documented, available in the standard Debian repositories, and best of all its familiar syntax meant that it was almost a drop-in replacement for rsync, but with the added backup of historical versions being saved. There are several examples on the project site which should get anyone interested up and running pretty quickly. However in my backup scripts it was a simple excersise to translate

rsync -avz --del user@webserver.tld:/srv/websites/project1/* /srv/site_backup/project1

to

rdiff-backup user@webserver.tld::/src/websites/project1/* /srv/site_backup/project1

(Note the double colon in the latter)

I can now copy from the backup directory in the normal manner, but I also have the option to delve back in history using something like this:

rdiff-backup -r 10D /srv/site_backup/project1/index.php ~/site_restore/

to recover the index file to the way it was ten days ago.

All well and good, this was a good step forward, however I noticed that there is a Windows version available. What a coup, being able to backup my servers and my desktop with the same tool! I downloaded and unpacked the Windows version, added the directory in which I’d put it into the path. (To modify the path on a Windows 7 computer, go to Start -> Computer -> System Properties -> Advanced System Settings -> Environment Variables -> Select “Path” and then click the “Edit…” button.) I fired up a command line and entered a rdiff-backup command and was met with an error saying :

Couldn't start up the remote connection by executing

ssh -C user@server.local rdiff-backup --server

Remember that, under the default settings, rdiff-backup must be installed in the PATH on the remote system. See the man page for more information on this. This message may also be displayed if the remote version of rdiff-backup is quite different from the local version (1.2.8).

Oh dear, all did not seem well. I knew I had ssh facilities on the computer, as I use PuTTY many times a day, however from the command line there was indeed no way to simply type ssh to open a remote conection. Some searching brought me to this blog which fairly quickly got me up and running. I added the PuTTY directory to my Path, and created a keypair so that I was able to log straight into the server I was using for backup. It was then a fairly simple matter to supply the correct arguments to rdiff-backup to ensure it used plink, the PuTTY command line utility. What I have running now is as follows:

rdiff-backup --remote-schema "plink.exe -i C:\path\to\key.ppk %s rdiff-backup --server" "C:/Users/Me/Documents" user@server.local::/srv/desktop-backup/my-desktop

Be warned, subjectively this seems noticably slower than rsync the first time it is run, however subsequent backups seem to be reasonably speedy.

I hope this helps someone else out there to improve the depth of information held in their backups. Just remember, always check your logs to make sure your backups actually run ;-).