[Info-vax] 8.4 freespace-drift problem?

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Fri May 22 21:51:59 EDT 2015


On 2015-05-22 21:47:40 +0000, David Froble said:

> Ok, I'm going to ask.  Normally I would not do so, because I have 
> problems believing anyone would ever do this.
> 
> Do you, every time, do a proper shutdown of VMS?

Yes.  When the server is going idle for a while, with the power-off option.

Why?  The shutdown notifies other nodes in the cluster rather than 
forcing those hosts to slog through the cluster transition based on the 
timers, and the shutdown avoids system disk and secondary disk rebuild 
at reboot and at next mount (see below), and the shutdown avoids having 
to perform shadow merges or shadow copies for shadowed disks, and the 
shutdown can perform the site-specific operations that cleanly shut 
down the applications that might be running.

> Or do you get in a hurry and just hit the power or reset switch?

Generally no.   Is the box on fire?  Then yes.

> Also, mount /norebuild is a very dangerous option, and should only be 
> used in emergencies.  My opinion.  YMMV

Skipping the rebuild works just fine and is entirely safe AFAIK.  In 
the typical case, it trades off speed for some disk free space.

If you need a box to boot faster, then using MOUNT /NOREBUILD and 
setting the ACP_REBLDSYSD system parameter to defer the rebuild wastes 
some free space, but is otherwise harmless.  If you need disks to mount 
faster, then MOUNT /NOREBUILD.   If you want to relocate where the 
rebuild runs — moving from a satellite with a slower I/O connection and 
lower performance to one of the core servers in a cluster with a much 
faster I/O connection, for instance — it's very common to defer the 
rebuild everywhere, then SET VOLUME /REBUILD on a server with local 
(fast) access and that's not very busy.

If you do choose to perform the rebuild when you boot (system disk) and 
when you MOUNT the other disks, then you have a slower subsequent boot 
after a crash or a hard halt, and you can get the I/O load of the 
rebuild from a lower-performance host, and that combination makes for a 
slower boot and can end up getting a cluster booting in (slower) 
lock-step behind some slow node that gets control first and gets to 
rebuilding all of your disks, too.

As disks get bigger and particularly as the disks are filled with more 
data, and as more disks are configured, the rebuild operations take 
longer, too.  Now since this rebuild generally just frees up the space 
from the allocation caches and from the files that were marked for 
delete from the time prior to the hard halt or the crash, it's not 
something that's usually a critical operation, either.  (This because 
RMS tries to always use "careful write ordering".  Not all applications 
do that.)

Now if your critical last write I/O operations generated within your 
heavily I/O active application environment didn't all make it to disk 
because you hard-halted the box prior to reboot all bets are off.   
BTW, power failures can play havoc with writes that are in flight out 
on the storage shelves, too.

Related: <http://labs.hoffmanlabs.com/node/1078>

-- 
Pure Personal Opinion | HoffmanLabs LLC




More information about the Info-vax mailing list