[Info-vax] SHADDETINCON, SHADOWING detects inconsistent state
Phillip Helbig---remove CLOTHES to reply
helbig at astro.multiCLOTHESvax.de
Thu Jan 1 06:56:50 EST 2009
My hobbyist cluster currently consists of:
VAX 4000-105A
VAXstation 4000-90A
DEC 3000 - M600
Each system has a 2- or 3-member shadow set as its system disk. There
are some non-shadowed disks (including CD-ROMs) and some 2-member shadow
sets distributed among the nodes (each member has a direct connection to
only one machine). In particular, DISK$USER has members on each of the
VAXes. I haven't changed much in 2 or 3 years.
Starting several weeks ago, and becoming more frequent in the last
couple of weeks, the VAX 4000-105A spontaneously reboots. Even though
SHADOW_MBR_TMO is set to 10 minutes and MVTIMEOUT to one hour
(SHADOW_SYS_TMO is 2 minutes but that isn't relevant here), after such a
reboot everything looks OK on the VAX 4000-105A but on (usually just one
of) the other machines, the system-disk shadow set and the CD-ROM on the
VAX 4000-105A and the DISK$USER shadow set have gone into mount-verify
timeout. This has always happened during the night, so I don't know how
long the spontaneous reboot takes. I can just dismount and remount the
system-disk shadow set and the CD-ROM on the VAX 4000-105A from the
other nodes, but since DISK$USER has gone into mount-verify timeout, I
have to reboot the corresponding node. (Note that SYSUAF etc are all on
DISK$USER.) I can't dismount it since it contains open files. I
haven't tried DISMOUNT/ABORT in such a situation. Should I? With
DISK$USER inaccessible, various applications will fail. A reboot is
probably quicker than getting everything going again by hand. (If it is
the VAXstation 4000-90A which needs to be rebooted, then I can dismount
and remount the member of DISK$USER on it from the ALPHA, so that I get
just a minicopy when the VAXstation 4000-90A comes back up.)
Note that everytime this has happened, DISK$USER was in the shadow-copy
state, copying from the member on the VAXstation 4000-90A to the member
on the VAX 4000-105A---even if DISK$USER as a shadow set isn't
accessible to the VAXstation 4000-90A and its members show up only as
remote shadow members.
I doubt it is possible to avoid these problems without creating more as
long as the spontaneous reboots are happening. However, I want to get
rid of the spontaneous reboots. ANALYZE/CRASH says:
OpenVMS (TM) VAX System dump analyzer
Dump taken on 1-JAN-2009 06:04:26.14
SHADDETINCON, SHADOWING detects inconsistent state
HELP/MESSAGE says:
SHADDETINCON, SHADOWING detects inconsistent state
Facility: BUGCHECK, System Bugcheck
Explanation: The volume shadowing software reached an irrecoverable or
inconsistent state because a shadow set failed an internal
consistency check.
User Action: Note the conditions leading to the error and contact a Compaq
support representative. If the system is configured to produce
a memory dump, retain the dump file.
I don't see how I can "Note the conditions leading to the error".
Since the hardware setup hasn't changed in years, and since I'm not
seeing any additional errors, my assumption is that the VAX 4000-105A
is acting up. Fortunately, I have an identical spare (thanks Hans!), so
I plan to swap the machines today. If the problem goes away, then
presumably there was a fault with the machine, but who knows what it
could be.
Actually, I can't swap out everything since I put all the memory for the
VAX 4000-105A I have (128 MB) in the one currently in the cluster, so I
will remove it and put it in the spare. I don't think this is a problem
with the memory.
Any further suggestions?
More information about the Info-vax
mailing list