[Info-vax] OpenVMS x64 Atom project

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Mon Jun 7 09:59:46 EDT 2021


On 2021-06-06 06:02:43 +0000, Phillip Helbig (undress to reply said:

> In article <s9h2t7$1qvs$1 at gioia.aioe.org>, =?UTF-8?Q?Arne_Vajh=c3=b8j?= 
> <arne at vajhoej.dk> writes:
> 
>>> Yes, there is little point in doing a backup if you don't test the 
>>> restore.  But imagine, say, a database of several hundred terabytes. 
>>> Even if you can restore it, you can't necessarily tell if the data are 
>>> somehow corrupt.  Yes, checksums and so on will catch some things, but 
>>> not all.

At the scale some of our apps are operating at now, silent Ethernet 
checksum failures are to be expected.

>> Traditional BACKUP only works good on a system with no activity. 
>> BACKUP/IGNORE=INTERLOCK does not solve the problem.
>> 
>> To get a consistent backup of a large database, without significant 
>> downtime, then one need a snapshot capability where updates after time 
>> T does not change what is being backed up.
> 
> Presumably with a database one would do a database backup, e.g. 
> RMU/BACKUP, which gives a consistent result.

That's an older approach and as is the analogous RMS journaling, and 
that does get a consistent backup—at the cost of blocking activity.

Basically, the quiesce function got moved from the app to the database, 
and better tuned to app activity. But it's still present.

RMS journaling being a frequent winner of the most-forgotten LP award.

Newer app approaches tend not to use that design, for performance reasons.

Both BACKUP and RMU get into trouble with the amount of data involved, 
and how long that task takes, and how much then gets blocked or 
deferred.

The BACKUP design has ~reached its theoretical I/O performance limits, 
and I'd expect the RMU design is close to those same limits.

For obvious reasons, SSD helps (massively) here. SSDs can mask a whole 
lot of latent OS and app algorithm-performance messes.

On OpenVMS, an app quiesce and app cache flush and host-based volume 
shadowset split is (vastly) faster than BACKUP or RMU /BACKUP.

Host-based volume shadowing being the all-time winner for LPs 
overlooked while searching for distributed software RAID-1 features.

Which then leads to designs with live spare servers directly updated 
(RAIS, etc), and to controller-level analogs to HBVS / RAID-1 splits.

Journaling right into a secondary server, which can write a 
non-volatile backup for recovery and/or flush to SSD or HDD archives, 
or can be live and running and current failover server.

And leads to in-memory designs (with archiving), as more than a few of 
our databases fit into server memory—q.v. SAP HANA, etc—and as writing 
to SSDs is, well, slow.



-- 
Pure Personal Opinion | HoffmanLabs LLC 




More information about the Info-vax mailing list