[Info-vax] hung program location
Tom Adams
w.tom.adams at gmail.com
Tue Feb 19 09:48:44 EST 2013
On Feb 19, 8:54 am, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
wrote:
> On 2013-02-19 13:26:44 +0000, Tom Adams said:
>
> > The PC is 80141918 (as shown on show proc/cont) The process is in HIB
> > when it's at that address.
>
> > The code is well controlled in CMS so it's easy produce link maps.
>
> > I restarted the hung processes. This hanging is a rare event that I
> > don't know how to reproduce. But the process does pause at that PC
> > in HIB during a normal operation mode.
>
> Build with full machine-code listings and with full maps, and start
> instrumenting the code.
>
> As a guess directed at the error...
>
> Look specifically at the handling of $hiber and $wake calls in the
> source code, as code that uses $hiber can easily be broken in various
> ways, and the end result is either a spurious $wake cycle — which the
> code should always expect — or the code gets stuck in a $hiber quite
> possibly because one or more $wake calls got coallesced into one $wake
> somewhere; it's not really a lost $wake call, but it seems like it.
>
> The gloriously ugly work-around for these problems is adding a $schdwk
> call into the code, and deliberately inducing a periodic spurious $wake.
>
> The best approach being figuring out where the $wake got lost, and
> reviewing the asynchronous portions of the code for errors.
>
> --
> Pure Personal Opinion | HoffmanLabs LLC
There are no direct $hiber or $wait calls, but I use lib$wait to cause
brief pauses.
Can't think of where other hidden $hiber's could be, unless they
happen in QIO calls.
The process does QIO calls to establish network and/or serial links.
It most
likely hung under conditions where it was suppose to be retrying to
establish a link for
weeks on end, because we only hook up the device it's trying to link
to about
once a month.
More information about the Info-vax
mailing list