[Info-vax] hung program location
Tom Adams
w.tom.adams at gmail.com
Tue Feb 19 12:35:37 EST 2013
On Feb 19, 10:29 am, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
wrote:
> On 2013-02-19 14:48:44 +0000, Tom Adams said:
>
> > There are no direct $hiber or $wait calls, but I use lib$wait to cause
> > brief pauses.
>
> I've found that programs with asynchronous logic that also include
> brief pauses can be excellent indicators of latent race conditions; of
> latent bugs that some previous programmer hadn't directly resolved.
>
> > Can't think of where other hidden $hiber's could be, unless they happen
> > in QIO calls.
>
> That's probably not the best approach when working with sys$hiber and
> sys$wake <http://labs.hoffmanlabs.com/node/829>, and irrespective of
> whether the code you're working on includes asynchronous logic. Given
> the complexity of a typical application and the possibility that some
> other programmer somewhere might decide to add sts$hiber or sys$wake or
> sys$schdwk calls (to some of your application code, to some library
> you're calling, or some system or compiler or application library
> you're using — the hibernation scheduling state is process-wide, after
> all), it's usually best to always plan for the arrival of spurious
> $wake calls.
>
> > The process does QIO calls to establish network and/or serial links.
>
> There are two flavors; sys$qio, and sys$qiow. The former is a rich
> source of asynchronous activity and quite ripe for introducing latent
> programming bugs. Failure to specify IOSBs, failure to correctly
> specify IOSBs that are and will remain valid over the lifetime of the
> asynchronous calls, failure to properly manage all memory and all
> variables that are shared between AST
> <http://labs.hoffmanlabs.com/node/617> and non-AST routines, event flag
> <http://labs.hoffmanlabs.com/node/613> collisions, etc.
>
> Any number of ways to go off the rails here, too.
>
> Caveat: simply waiting in the context of an AST routine is also
> something best avoided.
>
> > It most likely hung under conditions where it was suppose to be
> > retrying to establish a link for weeks on end, because we only hook up
> > the device it's trying to link to about once a month.
>
> Or maybe a garden-variety bug. But this "most likely hung" is a
> theory, one worth verification, but far from a certainty. Add
> application-level debugging, as a starting point.
>
> The ways of asynchronous programming on OpenVMS wizardry can be quite
> subtle, and sometimes quick to anger.
>
> If there is asynchronous code here (eg: sys$qio calls or other non-W
> calls, or asynch or synch calls with AST completion routines specified,
> and not necessarily with sys$qiow or other synchronous calls), then
> you're in the deep end of the pool here, too. Familiarity with what's
> documented in the OpenVMS Programming Concepts is likely necessary
> here, and you may need to become familiar with memory synchronization
> <http://labs.hoffmanlabs.com/node/407>, and with the synchronization
> chapters in the Programming Concepts manual.
>
> --
> Pure Personal Opinion | HoffmanLabs LLC
There is only one AST programmed in. It's a resource wait AST that
only fires when the system is telling the process to shutdown, so it
did not cause the problem. None of the QIOs or QIOWs use ASTs.
My theory is that the process got stuck in a LIB$WAIT call, but I
don't know why that would happen. But it could be that the program
gets into HIB somewhere else in the code processing, and there is the
possibility that I am overlooking some bug that would screw up a LIB
$WAIT.
One odd thing is that the same process hung on three different
Alphas. But I don't know if they all got hung at the same time. The
three process would all be trying to get (or had or lost) a network
connection to the same IP address of the same analyzer. The analyzer
is turned off and on and moved around to different physical connection
points. The code has been stable for a long time, but the practice
of moving devices around like this is kind of a new practice.
More information about the Info-vax
mailing list