Monday, 21 October 2019

Rsync Backup System [1]: Introduction

In February of last year I was looking for a new backup solution (I had used rsync to that point but found some issues with it) and tried a few different things. I have been using rdiff-backups since then (it comes as a part of Debian) but this has its issues too.

At the moment I can't recall what the issues were with rsync and it's looking more attractive as a new option. I think the reason I couldn't make it work satisfactorily before was because I used it over a network (samba) share which it doesn't work well over. But Linux has SSH as a much better option for network syncing which rsync has built in support for and should work a lot better than over a Samba network share. 

The reason I want to ditch rdiff-backup is that I want a system that will do incremental backups by copying the whole files and on a different disk volume. rdiff-backups works on the basis that you are backing up both full and incremental backups to the same volume and an incremental will only be the actual changed parts of files. Unfortunately this creates an incremental that is dependent on the full backup; if you lose the full version of the file then the incremental is of no use to you since you only have the file fragments that have changed, not the whole files that have changed.

The reason I chose rdiff-backup in the first place (over some options like borg, which I could not make work easily) was that it would copy a file structure (directories and files) instead of storing the files and folders inside an archive file. If for some reason the archive became corrupted it could be very difficult to recover the files inside it. There is an advantage that the archives are compressed but another option is to set up your backup disk to compress at disk level.  I tried doing this by formatting my backup disks as BTRFS but then the support for this changed in Buster and I am unsure if there are still issues with it. I will have another look at this especially if there is any option with compression support in ext4.

The idea at the moment is I would use rsync over SSH to back up each computer using a dedicated backup computer with a full backup about every three months and incremental backups about weekly in between. I have 2 TB disks for the full backups and 1 TB disks for the incrementals. Currently a WD Blue 2 TB disk suitable for a backup costs $105, while a 1 TB is $78. I'd need to get some more of the Raidon drive caddies for extra disks and the leather bags that they are stored in.

Rsync or anything that copies files and folders as is is a better more transparent option than using something that uses its own structure and archives that leaves you dependent on it being supported and developed as well. Rsync is very well optimised for syncing and the idea is it can do incrementals on top of existing full backups. rdiff-backup does the same sort of thing except that unlike rsync it will leave the original file unchanged, which rsync will overwrite (when resyncing to the same backup destination). There are a few possible schemes for backup. One possible is to put the backup computer in a remote location (not really possible in a house) and have it sync unattended a month's worth of data at a time then change the disk for another one and let it do another month, it would be copying all three computers' data on the same disk. Another option is two disks to do the incrementals with the disks exchanged weekly. This is the more likely scenario. Probably each disk will store 6 incrementals each at 2 weeks (the full backups happen each 3 months).

Anyway there is a bit of detail to work out and probably some more disks and trays etc to purchase to get this scenario all set up and hopefully it will come together soon.