[Info-vax] startup hangs during TCPIP
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Sun May 17 21:46:02 EDT 2015
On 2015-05-17 16:28:03 +0000, David Froble said:
> Phillip Helbig (undress to reply) wrote:
>> I have a satellite which I boot every day or two. A couple of days ago,
>> the startup started hanging. STARTUP.LOG and console output indicate
>> that this is always somewhere in the TCPIP startup. Control-P and
>> booting again was successful, though in one of three or four times I
>> had to do it twice.
>>
>> No, I haven't changed anything. This is still a 7.3-2 node.
I'd get everything to V8.4, and as expeditiously as possible. There's
little point in chasing weirdness with ancient and under-patched
versions, after all.
>> Once it boots up, everything is fine.
>>
>> Something I noticed around the same time, which wasn't present before,
>> is that a Smart-Array card (which I have never used) now fails to
>> initialize, but I don't see how that could be relevant.
You'll want to figure that out, as the problem might be related.
>> Also, during the startup I mount the system-disk shadow sets on other
>> boot nodes. The first one mounts, the second one doesn't, and no other
>> shadow sets after the second system-disk one mount either. (Of course,
>> its own system disk is mounted.) Strange, but again I don't see how it
>> could be related.
I'd stop looking for reasons stuff is not related, and solve the
visible errors. This for the same reason why fixing compiler
diagnostics is a Good Thing. Visible hardware errors can point to
other hardware errors, and those can cause what you are seeing.
>> All network is on the LAN (100 Mb/s, full duplex) and I don't see any
>> other problems with it.
So the error logs are clear? So there are no errors logged? (We
already know you don't see any problems — we'd not be having this
discussion, otherwise.) So you have a managed switch and can verify
that the settings are correct? Dumb switches are notorious for
misnegotiating with hardware as ancient as you are using here — per an
HP rep, OpenVMS Alpha V7.3-2 and later and OpenVMS I64 should generally
set to autonegotiate. But dumb switches don't always get that correct,
and — without a managed switch — there's no visibility into the switch
settings. But here, misnegotiations would toss errors, and a
misnegotiated configuration would usually either lock up solid, or
would run very slowly.
> Move some of the startup to a batch job. Keep only required things in
> the regular startup. This way, you'll have VMS up and running, and can
> look at things. You'll also have a batch log file to see what happened.
David: Phillip appears to be using startup logging, which means that
startup debugging will provide more details. For additional details,
see STARTUP_P1 and STARTUP_P2 in
<http://labs.hoffmanlabs.com/node/192>, or in the OpenVMS documentation.
Phillip: If you can't sort it with that startup verification or with an
added SET VERIFY / SET NOVERIFY, then force a crash from the console,
reboot, and post the CLUE CRASH data here.
On no evidence, startup hangs are quite often secondary to a
longstanding bug in the queue manager, when a remote host is not
reachable or otherwise misconfigured. This causes the queue start to
hang. Forever. There's no timeout, here. TCP/IP Services does use
the queue manager, so this is quite possibly in play.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list