[Info-vax] hung program location
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Tue Feb 19 12:50:02 EST 2013
On 2013-02-19 17:35:37 +0000, Tom Adams said:
> There is only one AST programmed in. It's a resource wait AST that
> only fires when the system is telling the process to shutdown, so it
> did not cause the problem.
Um, you're *debugging*. Have you *proved* that it only fires once, and
when you expect it?
> None of the QIOs or QIOWs use ASTs.
So you have asynchornous code.
> My theory is that the process got stuck in a LIB$WAIT call, but I
> don't know why that would happen. But it could be that the program
> gets into HIB somewhere else in the code processing, and there is the
> possibility that I am overlooking some bug that would screw up a LIB
> $WAIT.
You're *really* fixated on that call.
Are you sure that lib$wait call isn't masking a deeper problem, for instance?
> One odd thing is that the same process hung on three different
> Alphas. But I don't know if they all got hung at the same time. The
> three process would all be trying to get (or had or lost) a network
> connection to the same IP address of the same analyzer. The analyzer
> is turned off and on and moved around to different physical connection
> points. The code has been stable for a long time, but the practice
> of moving devices around like this is kind of a new practice.
Even apparently stable code is necessarily free of latent errors.
There was a bug latent in RMS in VMS — hardly unused code — for most of
thirty years, when it was found.
Debug. Desk reviews. Add diagnostics and detailed logging, as appropriate.
Theorize. Prove. Or disprove. But don't assume.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list