[Info-vax] SHADDETINCON, SHADOWING detects inconsistent state

Phillip Helbig---remove CLOTHES to reply helbig at astro.multiCLOTHESvax.de
Thu Jan 1 06:56:50 EST 2009


My hobbyist cluster currently consists of:

   VAX 4000-105A
   VAXstation 4000-90A
   DEC 3000 - M600

Each system has a 2- or 3-member shadow set as its system disk.  There
are some non-shadowed disks (including CD-ROMs) and some 2-member shadow
sets distributed among the nodes (each member has a direct connection to
only one machine).  In particular, DISK$USER has members on each of the
VAXes.  I haven't changed much in 2 or 3 years. 

Starting several weeks ago, and becoming more frequent in the last
couple of weeks, the VAX 4000-105A spontaneously reboots.  Even though
SHADOW_MBR_TMO is set to 10 minutes and MVTIMEOUT to one hour
(SHADOW_SYS_TMO is 2 minutes but that isn't relevant here), after such a
reboot everything looks OK on the VAX 4000-105A but on (usually just one
of) the other machines, the system-disk shadow set and the CD-ROM on the
VAX 4000-105A and the DISK$USER shadow set have gone into mount-verify
timeout.  This has always happened during the night, so I don't know how
long the spontaneous reboot takes.  I can just dismount and remount the
system-disk shadow set and the CD-ROM on the VAX 4000-105A from the
other nodes, but since DISK$USER has gone into mount-verify timeout, I
have to reboot the corresponding node.  (Note that SYSUAF etc are all on
DISK$USER.)  I can't dismount it since it contains open files.  I
haven't tried DISMOUNT/ABORT in such a situation.  Should I?  With
DISK$USER inaccessible, various applications will fail.  A reboot is 
probably quicker than getting everything going again by hand.  (If it is 
the VAXstation 4000-90A which needs to be rebooted, then I can dismount 
and remount the member of DISK$USER on it from the ALPHA, so that I get 
just a minicopy when the VAXstation 4000-90A comes back up.)

Note that everytime this has happened, DISK$USER was in the shadow-copy 
state, copying from the member on the VAXstation 4000-90A to the member 
on the VAX 4000-105A---even if DISK$USER as a shadow set isn't 
accessible to the VAXstation 4000-90A and its members show up only as 
remote shadow members.

I doubt it is possible to avoid these problems without creating more as 
long as the spontaneous reboots are happening.  However, I want to get 
rid of the spontaneous reboots.  ANALYZE/CRASH says:

      OpenVMS (TM) VAX System dump analyzer
   
   Dump taken on  1-JAN-2009 06:04:26.14
   SHADDETINCON, SHADOWING detects inconsistent state

HELP/MESSAGE says:

 SHADDETINCON,  SHADOWING detects inconsistent state

  Facility:     BUGCHECK, System Bugcheck

  Explanation:  The volume shadowing software reached an irrecoverable or
                inconsistent state because a shadow set failed an internal
                consistency check.

  User Action:  Note the conditions leading to the error and contact a Compaq
                support representative. If the system is configured to produce
                a memory dump, retain the dump file.

I don't see how I can "Note the conditions leading to the error".

Since the hardware setup hasn't changed in years, and since I'm not 
seeing any additional errors, my assumption is that the VAX 4000-105A 
is acting up.  Fortunately, I have an identical spare (thanks Hans!), so 
I plan to swap the machines today.  If the problem goes away, then 
presumably there was a fault with the machine, but who knows what it 
could be.

Actually, I can't swap out everything since I put all the memory for the
VAX 4000-105A I have (128 MB) in the one currently in the cluster, so I
will remove it and put it in the spare.  I don't think this is a problem 
with the memory.

Any further suggestions?




More information about the Info-vax mailing list