[Info-vax] 8.4 freespace-drift problem?
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Fri May 22 21:51:59 EDT 2015
On 2015-05-22 21:47:40 +0000, David Froble said:
> Ok, I'm going to ask. Normally I would not do so, because I have
> problems believing anyone would ever do this.
>
> Do you, every time, do a proper shutdown of VMS?
Yes. When the server is going idle for a while, with the power-off option.
Why? The shutdown notifies other nodes in the cluster rather than
forcing those hosts to slog through the cluster transition based on the
timers, and the shutdown avoids system disk and secondary disk rebuild
at reboot and at next mount (see below), and the shutdown avoids having
to perform shadow merges or shadow copies for shadowed disks, and the
shutdown can perform the site-specific operations that cleanly shut
down the applications that might be running.
> Or do you get in a hurry and just hit the power or reset switch?
Generally no. Is the box on fire? Then yes.
> Also, mount /norebuild is a very dangerous option, and should only be
> used in emergencies. My opinion. YMMV
Skipping the rebuild works just fine and is entirely safe AFAIK. In
the typical case, it trades off speed for some disk free space.
If you need a box to boot faster, then using MOUNT /NOREBUILD and
setting the ACP_REBLDSYSD system parameter to defer the rebuild wastes
some free space, but is otherwise harmless. If you need disks to mount
faster, then MOUNT /NOREBUILD. If you want to relocate where the
rebuild runs — moving from a satellite with a slower I/O connection and
lower performance to one of the core servers in a cluster with a much
faster I/O connection, for instance — it's very common to defer the
rebuild everywhere, then SET VOLUME /REBUILD on a server with local
(fast) access and that's not very busy.
If you do choose to perform the rebuild when you boot (system disk) and
when you MOUNT the other disks, then you have a slower subsequent boot
after a crash or a hard halt, and you can get the I/O load of the
rebuild from a lower-performance host, and that combination makes for a
slower boot and can end up getting a cluster booting in (slower)
lock-step behind some slow node that gets control first and gets to
rebuilding all of your disks, too.
As disks get bigger and particularly as the disks are filled with more
data, and as more disks are configured, the rebuild operations take
longer, too. Now since this rebuild generally just frees up the space
from the allocation caches and from the files that were marked for
delete from the time prior to the hard halt or the crash, it's not
something that's usually a critical operation, either. (This because
RMS tries to always use "careful write ordering". Not all applications
do that.)
Now if your critical last write I/O operations generated within your
heavily I/O active application environment didn't all make it to disk
because you hard-halted the box prior to reboot all bets are off.
BTW, power failures can play havoc with writes that are in flight out
on the storage shelves, too.
Related: <http://labs.hoffmanlabs.com/node/1078>
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list