[Info-vax] A 5 minutes hang during early stage of a shutdown.

Tue Jan 22 20:12:18 EST 2013

In article <kdn3dn$u4p$1 at news.albasani.net>, Jan-Erik Soderholm <jan-erik.soderholm at telia.com> writes:
>VAXman- @SendSpamHere.ORG wrote 2013-01-22 21:42:
>> In article <kdkilb$p8k$1 at news.albasani.net>, Jan-Erik Soderholm <jan-erik.soderholm at telia.com> writes:
>>> Hi.
>>> This isn't a show-stopper but I thought I'd ask if anyone
>>> has seen anything like this before.
>>>
>>> $ tcpip sh ver
>>>
>>>    HP TCP/IP Services for OpenVMS Alpha Version V5.5 - ECO 1
>>>    on a COMPAQ AlphaServer DS20E 666 MHz running OpenVMS V8.2
>>>
>>> $
>>>
>>> During the early stage of a normalt shutdown (with automatic
>>> reboot), the system hangs for more or less exactly 5 minutes.
>>>
>>> The detached shutdown process is in QUEMAN.EXE and in LEF during
>>> this period. The queues are "stopped" as far as I could see.
>>> There was one queue in "stopped pending". This batch queue
>>> had one job in MUTEX (the reason for the reboot). But, I have
>>
>> Did you bother to explore which MUTEX???  That might shed more light on
>> your problem than posting here receiving myriad suppositional replies.
>>
>
>Yes I tried to find out, but never got any data from the system
>that I managed to understand. And on the other side, I do think
>I know what causes the MUTEX (multiple processes doing telnet
>delete and/or create on the same TNA port at the same time) so
>I'm more focused into re-writing the app startup procs to
>avoid this.
>
>Note that this is not the kind of stuff the customer asks about,
>there are application and user projects that are pushed harder.
>
>This *could* be QUEMAN waiting for the batch queue with the MUTEX
>batch job to terminate.
>
>Anyway, we are also going from 8.2->8.4 and latest TCPIP soon, so
>I will not dig further into this at the moment.

If you *should* encounter it again, look in PCB[PCB$L_EFWM] of the process
in the MUTEX using SDA.  That will be the address of the MUTEX.  If you can
reconcile which MUTEX (READ/EXEC, EXAMINE address in the EFWM) or where it
resides (MAP address in the EFWM), you might have some ammunition to defeat
your problem.

I've never been one to reboot as a way to fix VMS.  Understanding why there
is a problem usually yields a non-reboot fix.  A major bank crashed one of
their systems last week because they unloaded some software when one of the
processes appeared wedged in MUTEX.  They caused more problems than fixing
problems.  Give a hoot; don't reboot. ;)

Here's an old theorem: Shit happens.
VAXman's corollary to: Shit happens for a reason!

-- 
VAXman- A Bored Certified VMS Kernel Mode Hacker    VAXman(at)TMESIS(dot)ORG

Well I speak to machines with the voice of humanity.