[Info-vax] Transient anal/disk errors

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Thu Dec 12 11:31:44 EST 2013


On 2013-12-12 15:51:16 +0000, tadamsmar said:

> On Thursday, December 12, 2013 10:07:28 AM UTC-5, Stephen Hoffman wrote:
>> There's also the infamous BACKUP /IGNORE=INTERLOCK command, which 
>> some>> folks think is an online BACKUP.  It's not.  Worse, it allows 
>> silent>> data corruptions in the output savesets.  If you have control 
>> over the>> applications involved, that's where the BACKUP support needs 
>> to reside,>> particularly if your applications are writing clumps of 
>> updates to>> disk.  Various relational databases on VMS include 
>> application-internal>> backup tools, and always use those in preference 
>> to using the OpenVMS>> BACKUP command.  Alternatively, quiesce the 
>> applications or the disks>> or the systems, and then use the standard 
>> BACKUP tools. Or quiesce the>> environment and yank a disk from the 
>> shadowset, and backup that.
> 
> You think I was recently working on my backup strategy?  I was just 
> working on those persistent ANAL/DISK problems.

Yes, so was I.  In my admittedly skewed view of the world, 
investigations of persistent disk errors are always secondary to having 
good and verified backups.  Preserve the most current data first, then 
study the disks and the errors.

> But I probably do need to work on my backup strategy.  I have been 
> yanking out a disk without quiescing and backing up the yanked disk, 
> and I have not done any deliberate recovery testing, just defacto when 
> I had to recover a file or compress a disk.  Just yanking a disk is 
> easy, I just have to run command procedures, but as you point out, it 
> might not have optimal reliability.

You're hot-plugging active disks, and probably in an environment 
without a quiesce function on the storage controller?

Don't do that.

You've probably been causing some of the errors and corruptions here.

> What's the easiest way to quiesce and yank?

Depending on the bus and the target, via DISMOUNT command.  Some 
storage controllers support a quiesce function, and others expect you 
to shut down.  I'm guessing your gear probably lacks one of those 
controllers; that feature usually only exists on outboard storage 
controllers.  It's not a feature usually found with host-based JBOB 
SCSI controllers, nor even necessarily on some of the host-based SCSI 
RAID controllers.

But that's not how I'd do the backups I'm referring to.  I'd DISMOUNT 
the disk from the shadowset, and MOUNT /NOWRITE the disk privately, and 
back up from there.  There are minimerge and minicopy bitmaps that were 
discussed here in massive detail when Phillip Helbig was trying to 
understand how all that worked, so I'm not going to bother reposting 
all of that here.  Those features will help bring the 
temporarily-removed disk back to current within the shadowset more 
quickly.   Search for threads with minicopy or minimerge or related 
keywords via Google Groups, and start reading.  Or check the current 
volume shadowing manual in the OpenVMS documentation set.  Or both.

> The only way I am sure of is to shutdown, boot with a CD, yank, then 
> reboot normally.

That's the best way, official way, and only supported way, if you need 
to reconfigure a SCSI, and lack a storage controller with a quiesce 
function.

> I am not sure that there is a console command that will yank a disk 
> from a shadowset, but I seem to recall one that will disable shadowing.

Allow me to translate "I don't recall" as "which manual should I read 
to learn more about the fundamental operations of the server?".  That'd 
be the volume shadowing manual.    <http://www.hp.com/go/openvms/doc>, 
select the VMS documentation shelf, and search for "shadowing", and 
skim that manual.  You'll definitely need to be more familiar with it 
if (when?) you decide to implement minicopy or minimerge.  (Though your 
VMS version is ancient, and there were definitely various patches made 
available in this and related areas of OpenVMS.)

> I have noticed that sometimes a yanked disk will not run ANAL/DISK 
> clean. This also seems to be transient.

Yeah.  Sometimes yanking the disk just silently corrupts the file data 
on that disk, depending on the timing.  I wouldn't assume other disks 
on the SCSI bus would be entirely immune from problems or corruptions, 
either.  Not without quiescing the bus, or shutting down, or 
dismounting the disks on that bus.



-- 
Pure Personal Opinion | HoffmanLabs LLC




More information about the Info-vax mailing list