[Info-vax] BACKUP, rsync, Time Machine (was: Re: Re; Spiralog, RMS Journaling...)
lawrencedo99 at gmail.com
lawrencedo99 at gmail.com
Mon Jun 20 06:13:41 EDT 2016
On Monday, June 20, 2016 at 9:41:36 PM UTC+12, Paul Sture wrote:
> Please explain.
rsync basically answers the question “given a source directory «src» and a destination directory «dst», what is the minimum that needs to be done to the contents of «dst» to turn it into an exact copy of «src»?”
Adding/removing files and subdirectories would seem to be fairly straightforward. The clever part of rsync is that it can also compare two versions of a file residing on different nodes on the network, *without having to copy the entire file across*, to figure out which parts have changed and which parts haven’t. That was the key part of Andrew Tridgell’s PhD thesis on rsync. It was an algorithm he could have patented, but he chose not to.
rsync is a wonderful general file-copying tool. You can use it to do huge copies, that might take hours or days. If a link goes down and the operation aborts, you can simply re-execute the same rsync command after things come back up, and it will resume from where it left off.
It can also create incremental backups that can be browsed and retrieved from as though they were full backups. The procedure goes something like this:
Initial backup:
rsync --archive «src» «backup1»
Next backup:
rsync --archive --link-dest=«backup1» «src» «backup2»
Basically, any file that hasn’t changed in «src» since the creation of «backup1» is hard-linked from that previous backup into its place in «backup2», instead of being copied again. The net result is that «backup2» behaves like a full backup, without the extra storage space (or network traffic) consumption.
And the nice thing is, there is nothing special about the backup format: it is just a regular filesystem volume obeying standard POSIX semantics--no funny tricks like Apple pulls with Time Machine. No special metadata required, no special file-retrieval software required--just the same standard file-manipulation commands you already use every day.
Because restoring from a backup will likely happen in a high-stress situation: the user or the company has lost some important files, and you have to get them back NOW. Screw up, and say goodbye to your customer, or your job, maybe even face legal consequences. The fewer extra mechanisms that are required to access the backups, the less chance there is for something to go wrong.
More information about the Info-vax
mailing list