[Info-vax] BACKUP, rsync, Time Machine (was: Re: Re; Spiralog, RMS Journaling...)

Paul Sture nospam at sture.ch
Mon Jun 20 11:38:42 EDT 2016


On 2016-06-20, lawrencedo99 at gmail.com <lawrencedo99 at gmail.com> wrote:
> On Monday, June 20, 2016 at 9:41:36 PM UTC+12, Paul Sture wrote:
>
>> Please explain.
>
> rsync basically answers the question “given a source directory «src»
> and a destination directory «dst», what is the minimum that needs to
> be done to the contents of «dst» to turn it into an exact copy of
> «src»?”

Let's back up to the bit you snipped:

Paul:
>> Time Machine uses a File System Event store to track changed directories.

Lawrence: 

> rsync doesn’t need one.

You underestimate the size of the problem.  When not much has changed, an hourly
Time Machine run will complete in a minute or two.

rsync cannot possibly scan all the directories on a system for changes in
such a short time, at least not on the spinning rust I have here.

Want timings?

find /Applications | wc -l # get total number of files in /Applications

995716 files, elapsed time 33 seconds

for the directories in there (I'm assuming 'find' is reasonably efficient
here):

find /Applications -type d | wc -l # get total number of directories
                                   # in /Applications

145437 directories, elapsed time 5 minutes 37 seconds

That's just the OS X GUI applications.  I've got many more files
spanning several disks.  Think half a million directories in total.
not including backups.  All to be scanned to detect changes, unless
you have a faster method of targeting modified directories.

The hint provided by the File System Event store cuts that down to a
manageable level.  If you read the description at pondini.org, you'll
discover the precautionary measures taken in the event of a system
crash, when the Event store might be incomplete.

> Adding/removing files and subdirectories would seem to be fairly
> straightforward. The clever part of rsync is that it can also compare
> two versions of a file residing on different nodes on the network,
> *without having to copy the entire file across*, to figure out which
> parts have changed and which parts haven’t. That was the key part of
> Andrew Tridgell’s PhD thesis on rsync. It was an algorithm he could
> have patented, but he chose not to.

Ah yes.  Do you realise you are sounding like a PR type acting on
Tridgell's behalf?

> rsync is a wonderful general file-copying tool. You can use it to do
> huge copies, that might take hours or days. If a link goes down and
> the operation aborts, you can simply re-execute the same rsync command
> after things come back up, and it will resume from where it left off.

Yes, I'm aware of it's good points thank you, but it is not a direct
replacement for Time Machine.

<snip>

> Because restoring from a backup will likely happen in a high-stress
> situation: the user or the company has lost some important files, and
> you have to get them back NOW. Screw up, and say goodbye to your
> customer, or your job, maybe even face legal consequences. The fewer
> extra mechanisms that are required to access the backups, the less
> chance there is for something to go wrong.

Yes, we know all that stuff thanks. :-)

-- 
There are two hard things in computer science, and they are cache invalidation,
naming, and off-by-one errors.



More information about the Info-vax mailing list