[Info-vax] SHADDETINCON, SHADOWING detects inconsistent state

Richard B. Gilbert rgilbert88 at comcast.net
Thu Jan 1 09:40:02 EST 2009


Phillip Helbig---remove CLOTHES to reply wrote:
> My hobbyist cluster currently consists of:
> 
>    VAX 4000-105A
>    VAXstation 4000-90A
>    DEC 3000 - M600
> 
> Each system has a 2- or 3-member shadow set as its system disk.  There
> are some non-shadowed disks (including CD-ROMs) and some 2-member shadow
> sets distributed among the nodes (each member has a direct connection to
> only one machine).  In particular, DISK$USER has members on each of the
> VAXes.  I haven't changed much in 2 or 3 years. 
> 
> Starting several weeks ago, and becoming more frequent in the last
> couple of weeks, the VAX 4000-105A spontaneously reboots.  Even though
> SHADOW_MBR_TMO is set to 10 minutes and MVTIMEOUT to one hour
> (SHADOW_SYS_TMO is 2 minutes but that isn't relevant here), after such a
> reboot everything looks OK on the VAX 4000-105A but on (usually just one
> of) the other machines, the system-disk shadow set and the CD-ROM on the
> VAX 4000-105A and the DISK$USER shadow set have gone into mount-verify
> timeout.  This has always happened during the night, so I don't know how
> long the spontaneous reboot takes.  I can just dismount and remount the
> system-disk shadow set and the CD-ROM on the VAX 4000-105A from the
> other nodes, but since DISK$USER has gone into mount-verify timeout, I
> have to reboot the corresponding node.  (Note that SYSUAF etc are all on
> DISK$USER.)  I can't dismount it since it contains open files.  I
> haven't tried DISMOUNT/ABORT in such a situation.  Should I?  With
> DISK$USER inaccessible, various applications will fail.  A reboot is 
> probably quicker than getting everything going again by hand.  (If it is 
> the VAXstation 4000-90A which needs to be rebooted, then I can dismount 
> and remount the member of DISK$USER on it from the ALPHA, so that I get 
> just a minicopy when the VAXstation 4000-90A comes back up.)
> 
> Note that everytime this has happened, DISK$USER was in the shadow-copy 
> state, copying from the member on the VAXstation 4000-90A to the member 
> on the VAX 4000-105A---even if DISK$USER as a shadow set isn't 
> accessible to the VAXstation 4000-90A and its members show up only as 
> remote shadow members.
> 
> I doubt it is possible to avoid these problems without creating more as 
> long as the spontaneous reboots are happening.  However, I want to get 
> rid of the spontaneous reboots.  ANALYZE/CRASH says:
> 
>       OpenVMS (TM) VAX System dump analyzer
>    
>    Dump taken on  1-JAN-2009 06:04:26.14
>    SHADDETINCON, SHADOWING detects inconsistent state
> 
> HELP/MESSAGE says:
> 
>  SHADDETINCON,  SHADOWING detects inconsistent state
> 
>   Facility:     BUGCHECK, System Bugcheck
> 
>   Explanation:  The volume shadowing software reached an irrecoverable or
>                 inconsistent state because a shadow set failed an internal
>                 consistency check.
> 
>   User Action:  Note the conditions leading to the error and contact a Compaq
>                 support representative. If the system is configured to produce
>                 a memory dump, retain the dump file.
> 
> I don't see how I can "Note the conditions leading to the error".
> 
> Since the hardware setup hasn't changed in years, and since I'm not 
> seeing any additional errors, my assumption is that the VAX 4000-105A 
> is acting up.  Fortunately, I have an identical spare (thanks Hans!), so 
> I plan to swap the machines today.  If the problem goes away, then 
> presumably there was a fault with the machine, but who knows what it 
> could be.
> 
> Actually, I can't swap out everything since I put all the memory for the
> VAX 4000-105A I have (128 MB) in the one currently in the cluster, so I
> will remove it and put it in the spare.  I don't think this is a problem 
> with the memory.
> 
> Any further suggestions?
> 

Do you have a "power conditioner" or, better, a UPS?  A "blink" by the 
power company that you might not even notice could be causing the reboots.



More information about the Info-vax mailing list