[Info-vax] A 5 minutes hang during early stage of a shutdown.
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Tue Jan 22 09:31:34 EST 2013
On 2013-01-22 13:50:28 +0000, Jan-Erik Soderholm said:
> OK, I guess that is simply defining SHUTDOWN$VERIFY, right?
Edit the file(s) involved and stick the proverbial SET VERIFY at the
top, if you have to. Revert when done. This is VMS and not rocket
science, after all.
>> Sort out the particular trigger for the mutex, as those can cause various
>> secondary problems. <http://labs.hoffmanlabs.com/node/231>
>
> This has been "up" in at least one thread before. I never menaged
> to find any information. I'm today confident that this happens
> if one try to run "telnet delete" or "telnet create" against
> a TNA device that is already in "waiting for delete".
>
> After a "telnet delete", it takes 10-15 seconds before the TNA
> device is removed.
Confidence is not equivalent to a reproducer, and a reproducer will
usually gain the attention of support folks. This given it is much
harder to deny there's an issue when you are handed a reproducer, and
it's much easier to test a fix with a reproducer. Whether you get
anywhere with that reproducer is another matter and another discussion
— the (mis)behavior demonstrated by the reproducer could still be
declared a "feature" and not a "bug", for instance.
If you're doing a whole lot of that and OpenVMS and TCP/IP aren't
reacting quickly enough or aren't reacting appropriately enough — and
HP support isn't providing you with a solution sufficient for your
needs — then write some tools to manage and allocate your application
TN device access yourself.
>> As a complete WAG, I've seen vaguely similar SHUTDOWN-time hangs with bogus
>> hosts listed in the queue database, and cases where the old host name is
>> latent in the queue database after a host has been renamed.
>
> Yes, that does correspond with the fact that the shutdown process
> is in QUEMAN.EXE.
Yes, you've mentioned this several times.
> We should probably either clean up the DECnet database or
> not start DECnet at all...
The distributed queue manager uses SCS-level communications, and not DECnet.
Specifically, the queue manager uses the SCA$TRANSPORT SCS SYSAP, and
not DECnet, and not IP.
This is the ugly, gnarly sausage factory of VMS. Do some local DCL
debugging. Figure out where things went sideways.
————
ps: Standard caveat whenever mixing Internet discussions of clustering
and DECnet in the same posting or same IRC chat: DECnet is not the
underpinnings of clustering. DECnet is not related to clustering.
Clustering does not use DECnet. Yes, there are clustering-related
services that can use DECnet, such as the use of MOP within the
satellite boot process, and the MONITOR VPM server. But there are
alternatives for these and other uses. But the transport underneath
clustering is not DECnet. You can cluster without DECnet installed.
This might not be your intent and you may well know the disparate
nature of DECnet and clustering, but I'm decidedly cautious when I
encounter discussions of tweaking DECnet when clustering is centrally
involved.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list