[Info-vax] startup hangs during TCPIP

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Sun May 17 21:46:02 EDT 2015


On 2015-05-17 16:28:03 +0000, David Froble said:

> Phillip Helbig (undress to reply) wrote:
>> I have a satellite which I boot every day or two.  A couple of days ago,
>> the startup started hanging.  STARTUP.LOG and console output indicate
>> that this is always somewhere in the TCPIP startup.  Control-P and 
>> booting again was successful, though in one of three or four times I 
>> had to do it twice.
>> 
>> No, I haven't changed anything.  This is still a 7.3-2 node.

I'd get everything to V8.4, and as expeditiously as possible.   There's 
little point in chasing weirdness with ancient and under-patched 
versions, after all.

>> Once it boots up, everything is fine.
>> 
>> Something I noticed around the same time, which wasn't present before,
>> is that a Smart-Array card (which I have never used) now fails to 
>> initialize, but I don't see how that could be relevant.

You'll want to figure that out, as the problem might be related.

>> Also, during the startup I mount the system-disk shadow sets on other 
>> boot nodes.  The first one mounts, the second one doesn't, and no other 
>> shadow sets after the second system-disk one mount either.  (Of course, 
>> its own system disk is mounted.)  Strange, but again I don't see how it 
>> could be related.

I'd stop looking for reasons stuff is not related, and solve the 
visible errors.   This for the same reason why fixing compiler 
diagnostics is a Good Thing.  Visible hardware errors can point to 
other hardware errors, and those can cause what you are seeing.

>> All network is on the LAN (100 Mb/s, full duplex) and I don't see any 
>> other problems with it.

So the error logs are clear?   So there are no errors logged?  (We 
already know you don't see any problems — we'd not be having this 
discussion, otherwise.)  So you have a managed switch and can verify 
that the settings are correct?  Dumb switches are notorious for 
misnegotiating with hardware as ancient as you are using here — per an 
HP rep, OpenVMS Alpha V7.3-2 and later and OpenVMS I64 should generally 
set to autonegotiate.  But dumb switches don't always get that correct, 
and — without a managed switch — there's no visibility into the switch 
settings.  But here, misnegotiations would toss errors, and a 
misnegotiated configuration would usually either lock up solid, or 
would run very slowly.

> Move some of the startup to a batch job.  Keep only required things in 
> the regular startup.  This way, you'll have VMS up and running, and can 
> look at things.  You'll also have a batch log file to see what happened.

David: Phillip appears to be using startup logging, which means that 
startup debugging will provide more details.  For additional details, 
see STARTUP_P1 and STARTUP_P2 in 
<http://labs.hoffmanlabs.com/node/192>, or in the OpenVMS documentation.

Phillip: If you can't sort it with that startup verification or with an 
added SET VERIFY / SET NOVERIFY, then force a crash from the console, 
reboot, and post the CLUE CRASH data here.

On no evidence, startup hangs are quite often secondary to a 
longstanding bug in the queue manager, when a remote host is not 
reachable or otherwise misconfigured.  This causes the queue start to 
hang.  Forever.   There's no timeout, here.   TCP/IP Services does use 
the queue manager, so this is quite possibly in play.



-- 
Pure Personal Opinion | HoffmanLabs LLC




More information about the Info-vax mailing list