[Info-vax] Loosing all LAT connections on one machine in DECNet Network
johnwallace4 at yahoo.co.uk
johnwallace4 at yahoo.co.uk
Wed Apr 15 16:36:00 EDT 2009
On Apr 15, 3:08 pm, JCamCMKRNL <jcam90... at earthlink.net> wrote:
> Answers to many of the questions:
> ---------------------------->Or is the VAX connected via AUI to the DELNI and the decserver connected
directly to the DELNI ?
>
> Yes. In the Data Center all 5 systems, VAX & PDPs plus one DECServer
> are all connected to one DELNI.
>
> >what version of VMS do you have ?
>
> V5.5-2
>
> >Another thing to try is, from VAX: SET HOST/LAT <vax> -or- SET HOST/LAT <pdp>
>
> Good idea, but outgoing LAT is currently disabled. I will enable it
> and try again, but that requires Change Configuration paperwork.
>
> > MC LATCP SHOW SERVICES
>
> I will check this out if/when this problem occurs again.
>
> >What kind of VAX are you running?
>
> Micro VAX 4000-400
>
> >you mention plant, is this a Fuji site?
>
> No. We are a medical manufacturer.
>
> >Is it possible to connect a DEMPR or DESPR to the DELNI and run thinwire between the VAX and the DELNI?
>
> We have no 10-Base2 equipment or media.
>
> >Does LAT report errors?
>
> We have not experienced the problem since the last reboot, and the
> counters currently show no errors, but this was interesting from the
> MCR LATCP SHOW LINK/COUNT
> Transmit Errors -
> Excessive Collisions: No
> Carrier Check Failure: Yes
> Short Circuit: Yes
> Open Circuit: Yes
> Frame Too Long: Yes
> Remote Failure To Defer: No
> Transmit Underrun: Yes
> Transmit Failure: No
>
> >What happens if you put an AUI loopback plug in the global port of the DELNI< does the LAT service return immediately?
>
> Good suggestion. I would like to try this if/when the problem returns,
> but then the rest of the plant would loose their LAT connections to
> the PDP-11s thus affecting production, so it would not be possible.
>
> >Is LAT running on the PDP-11's, even if they run RSX-11M(+) the DECserver 200's may be used for reverse LAT. Furthermore, outbound LAT is not enabled by default.
>
> Yes, PDP-11s are running LAT.
> Yes, we have reverse LAT ports on PDP-11 and VAX.
> No, outbound LAT is not enabled on any hosts.
>
> >It doesn't have to be hardware. DECnet to/from the VAX still seems to work and - I assume - it is using the same ethernet interface.
>
> That is correct, same interface.
>
> >Looking at the DECserver counters may also provide some more insight: SHOW COUNTER, SHOW NODE {vax} COUNT, SHOW SERVICE {vax}
>
> I dont think the counters on the DECServers will yeild much
> information, because all DECServers throughout the plant loose their
> LAT connections to the VAX when this happens.
>
> -----------------------
> I will keep you all informed.
> I appreciate all of your input.
>
> Jeff Cameron
The output from LATCP> SHOW LINK /COUNT suggest you have dodgy
hardware (or cabling) somewhere, introducing dodginess which causes
the lightweight LAT protocol to fail. Exactly as JF suggested, LAT
won't retry, it will fail visibly, whereas DECnet retries invisibly
and thus survives the disruption.
If the underlying problem were a simple "network storm" of valid but
unwanted traffic you wouldn't expect to see the "carrier check
failure", "short circuit", and "open circuit" errors (unless of course
you'd been doing enough tinkering to cause those errors yourself!).
At least one of the DEC network card/transceiver combos was actually
able to report the distance to a "cable fault" using a sort of built
in "time domain reflectometer"; I can't remember whether VAX4000
network card/chip had that feature (iirc it was LANCE based and didn't
do it), or how it was presented inside NCP. Might be worth checking
the DECserver doesn't say anything about it also.
The "NCP> Show Known Line Count" should have similar counters which
are actually counters not just "yes/no", they will also be zeroable so
you can start from a known baseline at a given time.
If you can get the DECserver counters easily (TSM or just NCP> CONNECT
NODE etc, or a laptop/terminal emulator physically plugged in to the
server) please post them; they may not add anything to the picture,
but they might. At least it's non-disruptive.
You probably already suspected dodgy hardware. Your next task is to
narrow down the possible culprits. You've already said you don't have
access to 10Base2 kit so this is going to be next to impossible
without some disruption (which is going to be next to impossible as
this is a working factory, right?). Back in the days of yellow or
orange coax, repeaters or bridges would normally be used to isolate
faulty kit in different network segments, though of course introducing
this extra kit brought its own risks - extra active components to
fail.
You have spares for all the active network kit, right? E.g.
DECservers, DELNIs, H400x, DELQA? You could swap out one or two bits
at a time and see if the fault recurs (though if the fault only
occurs maybe once a week, it's going to be a slow process).
Same goes for the MicroVAX itself. Do you have a spare network card
(DELQA) you could plug in? It's not absolutely essential that it has
the right front panel for the 4000/400, it's just essential that it's
a known good card.
Has anybody installed any significant new electrical kit on the
factory floor recently ? Transceiver cables are particularly
vulnerable to RF interference. On a related tack, is there any visible
pattern to the network failure times?
How much is the disruption costing you? Enough to make it worthwhile
upgrading the backbone to modern technology (managed switches, fibre
links, etc) with enough media converters to get the old kit connected
to the new backbone?
Good luck.
More information about the Info-vax
mailing list