[Info-vax] SHADDETINCON, SHADOWING detects inconsistent state
Richard B. Gilbert
rgilbert88 at comcast.net
Thu Jan 1 09:40:02 EST 2009
Phillip Helbig---remove CLOTHES to reply wrote:
> My hobbyist cluster currently consists of:
>
> VAX 4000-105A
> VAXstation 4000-90A
> DEC 3000 - M600
>
> Each system has a 2- or 3-member shadow set as its system disk. There
> are some non-shadowed disks (including CD-ROMs) and some 2-member shadow
> sets distributed among the nodes (each member has a direct connection to
> only one machine). In particular, DISK$USER has members on each of the
> VAXes. I haven't changed much in 2 or 3 years.
>
> Starting several weeks ago, and becoming more frequent in the last
> couple of weeks, the VAX 4000-105A spontaneously reboots. Even though
> SHADOW_MBR_TMO is set to 10 minutes and MVTIMEOUT to one hour
> (SHADOW_SYS_TMO is 2 minutes but that isn't relevant here), after such a
> reboot everything looks OK on the VAX 4000-105A but on (usually just one
> of) the other machines, the system-disk shadow set and the CD-ROM on the
> VAX 4000-105A and the DISK$USER shadow set have gone into mount-verify
> timeout. This has always happened during the night, so I don't know how
> long the spontaneous reboot takes. I can just dismount and remount the
> system-disk shadow set and the CD-ROM on the VAX 4000-105A from the
> other nodes, but since DISK$USER has gone into mount-verify timeout, I
> have to reboot the corresponding node. (Note that SYSUAF etc are all on
> DISK$USER.) I can't dismount it since it contains open files. I
> haven't tried DISMOUNT/ABORT in such a situation. Should I? With
> DISK$USER inaccessible, various applications will fail. A reboot is
> probably quicker than getting everything going again by hand. (If it is
> the VAXstation 4000-90A which needs to be rebooted, then I can dismount
> and remount the member of DISK$USER on it from the ALPHA, so that I get
> just a minicopy when the VAXstation 4000-90A comes back up.)
>
> Note that everytime this has happened, DISK$USER was in the shadow-copy
> state, copying from the member on the VAXstation 4000-90A to the member
> on the VAX 4000-105A---even if DISK$USER as a shadow set isn't
> accessible to the VAXstation 4000-90A and its members show up only as
> remote shadow members.
>
> I doubt it is possible to avoid these problems without creating more as
> long as the spontaneous reboots are happening. However, I want to get
> rid of the spontaneous reboots. ANALYZE/CRASH says:
>
> OpenVMS (TM) VAX System dump analyzer
>
> Dump taken on 1-JAN-2009 06:04:26.14
> SHADDETINCON, SHADOWING detects inconsistent state
>
> HELP/MESSAGE says:
>
> SHADDETINCON, SHADOWING detects inconsistent state
>
> Facility: BUGCHECK, System Bugcheck
>
> Explanation: The volume shadowing software reached an irrecoverable or
> inconsistent state because a shadow set failed an internal
> consistency check.
>
> User Action: Note the conditions leading to the error and contact a Compaq
> support representative. If the system is configured to produce
> a memory dump, retain the dump file.
>
> I don't see how I can "Note the conditions leading to the error".
>
> Since the hardware setup hasn't changed in years, and since I'm not
> seeing any additional errors, my assumption is that the VAX 4000-105A
> is acting up. Fortunately, I have an identical spare (thanks Hans!), so
> I plan to swap the machines today. If the problem goes away, then
> presumably there was a fault with the machine, but who knows what it
> could be.
>
> Actually, I can't swap out everything since I put all the memory for the
> VAX 4000-105A I have (128 MB) in the one currently in the cluster, so I
> will remove it and put it in the spare. I don't think this is a problem
> with the memory.
>
> Any further suggestions?
>
Do you have a "power conditioner" or, better, a UPS? A "blink" by the
power company that you might not even notice could be causing the reboots.
More information about the Info-vax
mailing list