[Info-vax] XPERR analysis

Wed Dec 9 15:18:42 EST 2009

On 9 Dez., 15:11, GerMarsh <marsh.fam... at tirhir.com> wrote:
> On 8 Dec, 17:37, Volker Halle <volker_ha... at hotmail.com> wrote:
>
>
>
>
>
> > On 8 Dez., 15:12, Steve <etmsr... at yahoo.co.uk> wrote:
>
> > > On 8 Dec, 08:28, Volker Halle <volker_ha... at hotmail.com> wrote:
>
> > > > On 7 Dez., 18:34, Steve <etmsr... at yahoo.co.uk> wrote:
>
> > > > > On Dec 2, 3:10 pm, GerMarsh <marsh.fam... at tirhir.com> wrote:
>
> > > > > > On 1 Dec, 18:13, Volker Halle <volker_ha... at hotmail.com> wrote:
>
> > > > > > > A XQPERR is most likely caused by a software inconsistency in the
> > > > > > > Files-11 sub-system (F11BXQP). This kind of bugcheck is an INLINE
> > > > > > > bugcheck, initiated by a BUGW macro instruction. This is a TRAP
> > > > > > > instruction, so the updated PC points to the NEXT instruction. That
> > > > > > > instruction is totally unrelated , but provides an important crash
> > > > > > > footprint information.
>
> > > > > > > A XQPERR crash with the next instruction being a 'PUSHL R2' is a known
> > > > > > > footprint. Are you running any defragmentation tools on this node ?
> > > > > > > Make sure to get the most recent F11BXQP.EXE from the latest V7.1
> > > > > > > build from HP.
>
> > > > > > > Volker.
>
> > > > > > Thank you for that response, Volker - I did notice that the disk has
> > > > > > more than its fair share of badly fragmented files. Some have many
> > > > > > extension headers.
>
> > > > > > I'll see if there's a patch for the ancient F11BXQP.
>
> > > > > Gerald and I have found that although one of the clustered nodes
> > > > > doesn't have the latest one, the system that keeps crashing DOES have
> > > > > the latest F11X patch - VAXF11X06_071.
> > > > > :o(
> > > > > Since the system was reduced to a single CPU last weekend it's been
> > > > > stable though (touch wood!)
> > > > > Steve- Zitierten Text ausblenden -
>
> > > > > - Zitierten Text anzeigen -
>
> > > > Steve,
>
> > > > this crash is most likely in routine RES_SEQ_MISMATCH with a source
> > > > code comment of:
>
> > > > Found a stale referenced or nondirectory FCB in FCB queue
>
> > > > The last F11BXQP.EXE from VAXF11X06_071 is from 13-OCT-2009.
>
> > > > Crashes with this footprint have been seen on V6.2, V7.1 and even
> > > > V7.2. VMS engineering has supplied the most recent F11BXQP image in
> > > > those cases. If you have (prior version !) support, you need to
> > > > escalate this problem to HP and ask for the most recent F11BXQP.EXE
> > > > from the last build for the remedial stream for V7.1.
>
> > > > If the crash only happens if enabling a SECOND CPU, this may very well
> > > > be a synchronization bug within the XQP, which gets triggered if
> > > > running on a SMP system.
>
> > > > Volker.- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Hi Volker,
>
> > > Did you mean 2009 or 1999?  If 1999 then we have it installed.  If
> > > 2009...
>
> > > Steve- Zitierten Text ausblenden -
>
> > > - Zitierten Text anzeigen -
>
> > Steve,
>
> > OpenVMS VAX V7.1 is more than 10 years old. The last XQP patch for
> > V7.1 has been released on 13-JUN-2000 ! The F11BXQP.EXE file had a
> > link date of 13-OCT-1999 - this should be the version you're running.
>
> > You would need to contact HP, if you want a newer F11BXQP.EXE, which
> > may fix those crashes.
>
> > Volker.- Hide quoted text -
>
> > - Show quoted text -
>
> Does the instruction stream point to the suspect module?...
>
> SDA> ex @pc-10:@pc+10/instr
> %SDA-W-INSKIPPED, unreasonable instruction stream - 2 bytes skipped
> F11BXQP+02C95:  TSTL    42(R2)
> F11BXQP+02C98:  BEQL    F11BXQP+02C9F
> F11BXQP+02C9A:  TSTW    18(R2)
> F11BXQP+02C9D:  BEQL    F11BXQP+02CA3
> F11BXQP+02C9F:  BUGW   #05CC
> F11BXQP+02CA3:  PUSHL   R2
> F11BXQP+02CA5:  CALLS   #01,F11X$INIT_XQP+002FD
> F11BXQP+02CAC:  PUSHL   R2
> F11BXQP+02CAE:  CALLS   #01,F11BXQP+01546
> F11BXQP+02CB5:  PUSHL   R2
> SDA>- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -

Yes, this is the code stream in REQ_SEQ_MISMATCH, R2 points to a FCB.
The source code statement looks like this:

  2928  2  IF .PRIM_FCB[FCB$L_DIRINDX] EQLA 0 OR .PRIM_FCB[FCB
$W_REFCNT] NEQU 0
  2929  2  THEN
  2930  2      BUG_CHECK(XQPERR, FATAL, 'Found a stale referenced or
non
                                         directory FCB in FCB queue');

But this does not help you. It IS a bug in the XQP and only OpenVMS
engineering can help you. If you see this bugcheck when adding a
second CPU to the system - if I understand your post correctly -  then
you at least may have found a trigger for this bugcheck.

There have been escalations to OpenVMS engineering with this bugcheck
footprint in the past. I don't have any information, if this problem
has ever definitely been solved. You probably need a couple of
crashdumps with this footprint and a knowledgeable XQP engineer to try
to diagnose this problem. But as this is a V7.1 crash, you'll also
need prior version support to be able to escalate this crash to HP
OpenVMS engineering.

What you may want to learn from this: if you're running an old and
unsupported version of an operating system, you should be very careful
when making changes to the underlying hardware, as this may trigger
problems, which may be not solvable in the context of an old operating
system version.

Volker.