[Info-vax] hung program location

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Tue Feb 19 12:50:02 EST 2013


On 2013-02-19 17:35:37 +0000, Tom Adams said:

> There is only one AST programmed in. It's a resource wait AST that
> only fires when the system is telling the process to shutdown, so it
> did not cause the problem.

Um, you're *debugging*.  Have you *proved* that it only fires once, and 
when you expect it?

> None of the QIOs or QIOWs use ASTs.

So you have asynchornous code.

> My theory is that the process got stuck in a LIB$WAIT call, but I
> don't know why that would happen.  But it could be that the program
> gets into HIB somewhere else in the code processing, and there is the
> possibility that I am overlooking some bug that would screw up a LIB
> $WAIT.

You're *really* fixated on that call.

Are you sure that lib$wait call isn't masking a deeper problem, for instance?

> One odd thing is that the same process hung on three different
> Alphas.   But I don't know if they all got hung at the same time.  The
> three process would all be trying to get (or had or lost) a network
> connection to the same IP address of the same analyzer.  The analyzer
> is turned off and on and moved around to different physical connection
> points.   The code has been stable for a long time, but the practice
> of moving devices around like this is kind of  a new practice.

Even apparently stable code is necessarily free of latent errors.

There was a bug latent in RMS in VMS — hardly unused code — for most of 
thirty years, when it was found.

Debug.  Desk reviews.  Add diagnostics and detailed logging, as appropriate.

Theorize.  Prove.  Or disprove.  But don't assume.


-- 
Pure Personal Opinion | HoffmanLabs LLC




More information about the Info-vax mailing list