[Info-vax] BACKUP, rsync, Time Machine (was: Re: Re; Spiralog, RMS Journaling...)
Paul Sture
nospam at sture.ch
Mon Jun 20 11:38:42 EDT 2016
On 2016-06-20, lawrencedo99 at gmail.com <lawrencedo99 at gmail.com> wrote:
> On Monday, June 20, 2016 at 9:41:36 PM UTC+12, Paul Sture wrote:
>
>> Please explain.
>
> rsync basically answers the question “given a source directory «src»
> and a destination directory «dst», what is the minimum that needs to
> be done to the contents of «dst» to turn it into an exact copy of
> «src»?”
Let's back up to the bit you snipped:
Paul:
>> Time Machine uses a File System Event store to track changed directories.
Lawrence:
> rsync doesn’t need one.
You underestimate the size of the problem. When not much has changed, an hourly
Time Machine run will complete in a minute or two.
rsync cannot possibly scan all the directories on a system for changes in
such a short time, at least not on the spinning rust I have here.
Want timings?
find /Applications | wc -l # get total number of files in /Applications
995716 files, elapsed time 33 seconds
for the directories in there (I'm assuming 'find' is reasonably efficient
here):
find /Applications -type d | wc -l # get total number of directories
# in /Applications
145437 directories, elapsed time 5 minutes 37 seconds
That's just the OS X GUI applications. I've got many more files
spanning several disks. Think half a million directories in total.
not including backups. All to be scanned to detect changes, unless
you have a faster method of targeting modified directories.
The hint provided by the File System Event store cuts that down to a
manageable level. If you read the description at pondini.org, you'll
discover the precautionary measures taken in the event of a system
crash, when the Event store might be incomplete.
> Adding/removing files and subdirectories would seem to be fairly
> straightforward. The clever part of rsync is that it can also compare
> two versions of a file residing on different nodes on the network,
> *without having to copy the entire file across*, to figure out which
> parts have changed and which parts haven’t. That was the key part of
> Andrew Tridgell’s PhD thesis on rsync. It was an algorithm he could
> have patented, but he chose not to.
Ah yes. Do you realise you are sounding like a PR type acting on
Tridgell's behalf?
> rsync is a wonderful general file-copying tool. You can use it to do
> huge copies, that might take hours or days. If a link goes down and
> the operation aborts, you can simply re-execute the same rsync command
> after things come back up, and it will resume from where it left off.
Yes, I'm aware of it's good points thank you, but it is not a direct
replacement for Time Machine.
<snip>
> Because restoring from a backup will likely happen in a high-stress
> situation: the user or the company has lost some important files, and
> you have to get them back NOW. Screw up, and say goodbye to your
> customer, or your job, maybe even face legal consequences. The fewer
> extra mechanisms that are required to access the backups, the less
> chance there is for something to go wrong.
Yes, we know all that stuff thanks. :-)
--
There are two hard things in computer science, and they are cache invalidation,
naming, and off-by-one errors.
More information about the Info-vax
mailing list