[Info-vax] Transient anal/disk errors
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Thu Dec 12 11:31:44 EST 2013
On 2013-12-12 15:51:16 +0000, tadamsmar said:
> On Thursday, December 12, 2013 10:07:28 AM UTC-5, Stephen Hoffman wrote:
>> There's also the infamous BACKUP /IGNORE=INTERLOCK command, which
>> some>> folks think is an online BACKUP. It's not. Worse, it allows
>> silent>> data corruptions in the output savesets. If you have control
>> over the>> applications involved, that's where the BACKUP support needs
>> to reside,>> particularly if your applications are writing clumps of
>> updates to>> disk. Various relational databases on VMS include
>> application-internal>> backup tools, and always use those in preference
>> to using the OpenVMS>> BACKUP command. Alternatively, quiesce the
>> applications or the disks>> or the systems, and then use the standard
>> BACKUP tools. Or quiesce the>> environment and yank a disk from the
>> shadowset, and backup that.
>
> You think I was recently working on my backup strategy? I was just
> working on those persistent ANAL/DISK problems.
Yes, so was I. In my admittedly skewed view of the world,
investigations of persistent disk errors are always secondary to having
good and verified backups. Preserve the most current data first, then
study the disks and the errors.
> But I probably do need to work on my backup strategy. I have been
> yanking out a disk without quiescing and backing up the yanked disk,
> and I have not done any deliberate recovery testing, just defacto when
> I had to recover a file or compress a disk. Just yanking a disk is
> easy, I just have to run command procedures, but as you point out, it
> might not have optimal reliability.
You're hot-plugging active disks, and probably in an environment
without a quiesce function on the storage controller?
Don't do that.
You've probably been causing some of the errors and corruptions here.
> What's the easiest way to quiesce and yank?
Depending on the bus and the target, via DISMOUNT command. Some
storage controllers support a quiesce function, and others expect you
to shut down. I'm guessing your gear probably lacks one of those
controllers; that feature usually only exists on outboard storage
controllers. It's not a feature usually found with host-based JBOB
SCSI controllers, nor even necessarily on some of the host-based SCSI
RAID controllers.
But that's not how I'd do the backups I'm referring to. I'd DISMOUNT
the disk from the shadowset, and MOUNT /NOWRITE the disk privately, and
back up from there. There are minimerge and minicopy bitmaps that were
discussed here in massive detail when Phillip Helbig was trying to
understand how all that worked, so I'm not going to bother reposting
all of that here. Those features will help bring the
temporarily-removed disk back to current within the shadowset more
quickly. Search for threads with minicopy or minimerge or related
keywords via Google Groups, and start reading. Or check the current
volume shadowing manual in the OpenVMS documentation set. Or both.
> The only way I am sure of is to shutdown, boot with a CD, yank, then
> reboot normally.
That's the best way, official way, and only supported way, if you need
to reconfigure a SCSI, and lack a storage controller with a quiesce
function.
> I am not sure that there is a console command that will yank a disk
> from a shadowset, but I seem to recall one that will disable shadowing.
Allow me to translate "I don't recall" as "which manual should I read
to learn more about the fundamental operations of the server?". That'd
be the volume shadowing manual. <http://www.hp.com/go/openvms/doc>,
select the VMS documentation shelf, and search for "shadowing", and
skim that manual. You'll definitely need to be more familiar with it
if (when?) you decide to implement minicopy or minimerge. (Though your
VMS version is ancient, and there were definitely various patches made
available in this and related areas of OpenVMS.)
> I have noticed that sometimes a yanked disk will not run ANAL/DISK
> clean. This also seems to be transient.
Yeah. Sometimes yanking the disk just silently corrupts the file data
on that disk, depending on the timing. I wouldn't assume other disks
on the SCSI bus would be entirely immune from problems or corruptions,
either. Not without quiescing the bus, or shutting down, or
dismounting the disks on that bus.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list