[Info-vax] Loosing all LAT connections (More answered questions)

JCamCMKRNL jcam90502 at earthlink.net
Fri Apr 17 11:09:33 EDT 2009


First, thanks to all who have responded. Your information has been
very valuable.
So far, we have not had another occurrence of these dropping of all
LAT connections on one system. Just the original three occurrences in
the past three weeks. The information on the LAT counters do seem to
indicate that the problem will occur again. It is just a mater of
when.

Several of you asked some more questions about this issue, so I have
gathered the questions and the answers below. I hope I have hit all of
your queries. In particular, I think the very last question here and
its answer is very important.
----------------
> what does the current output of this show?
  MCR NCP SHOW COUNTER KNOW CIRC

  It is very clean:
Known Circuit Counters as of 17-APR-2009 06:30:10

Circuit = ISA-0

      >65534  Seconds since last zeroed
           0  Terminating packets received
           0  Originating packets sent
           0  Terminating congestion loss
           0  Transit packets received
           0  Transit packets sent
           0  Transit congestion loss
           0  Circuit down
           0  Initialization failure
           0  Adjacency down
           0  Peak adjacencies
       28945  Data blocks sent
     1447250  Bytes sent
           0  Data blocks received
           0  Bytes received
           0  Unrecognized frame destination
           0  User buffer unavailable

> Make this MC NCP SHOW KNOWN LINE COUNTERS
This is clean except some send failures/collisions:
Known Line Counters as of 17-APR-2009 06:31:45

Line = ISA-0

      >65534  Seconds since last zeroed
     1691897  Data blocks received
       25491  Multicast blocks received
           0  Receive failure
    78211496  Bytes received
     1529460  Multicast bytes received
           0  Data overrun
     2240057  Data blocks sent
       37989  Multicast blocks sent
          87  Blocks sent, multiple collisions
         102  Blocks sent, single collision
        1173  Blocks sent, initially deferred
   107990422  Bytes sent
     1729968  Multicast bytes sent
        8030  Send failure, including:
                Carrier check failed
        8030  Collision detect check failure
           0  Unrecognized frame destination
           0  System buffer unavailable
           0  User buffer unavailable

> Jeff, you write that "the counters show no errors, but this was interesting from the MCR LATCP SHOW LINK/COUNT ...etc"
> What counters show no errors?
Here is the complete output of the LAT LINK counters:
Link Name:    LAT$LINK
Device Name:  _EZA4:

Seconds Since Zeroed:            65535
Messages Received:             1693146
Multicast Msgs Received:         25517
Bytes Received:               78269314
Multicast Bytes Received:      1531020
System Buffer Unavailable:           0
Unrecognized Destination:            0

     Messages Sent:             2241710
     Multicast Msgs Sent:         38006
     Bytes Sent:              108119723
     Multicast Bytes Sent:      1730717
     User Buffer Unavailable:         0
     Data Overrun:                    0

Receive Errors -
   Block Check Error:               No
   Framing Error:                   No
   Frame Too Long:                  No
   Frame Status Error:              No
   Frame Length Error:              No

Transmit Errors -
   Excessive Collisions:        No
   Carrier Check Failure:       Yes
   Short Circuit:               Yes
   Open Circuit:                Yes
   Frame Too Long:              Yes
   Remote Failure To Defer:     No
   Transmit Underrun:           Yes
   Transmit Failure:            No

CSMACD Specific Counters
------------------------

Transmit CDC Failure:             8030

Messages Transmitted -
   Single Collision:               102
   Multiple Collisions:             87
   Initially Deferred:            1173

> The transceiver on the DELNI wasn't replaced recently was it?
No. It is the original H4000 which was installed about 6 years ago.

> You wrote that there is no 10BASE2 gear available. Is it possible to
> use RJ45 transceivers (enable heartbeat), a low speed UTP switch or
> hub with an AUI port on it?
I do have a Black Box switch with one AUI port, and 8 10-BaseT RJ45
ports available, but it requires change control paperwork to connect
it to the DEC Network. I would like to avaoid doing this.

> If it turns out that it is not hardware, is it possible that there is a
> PC (or other 100MB equipment) connected to the backbone somewhere?
No. At this time all equipment on the DEC Network is 100% Digital
Equipment (Not Compaq, not HP) hardware.

> If you can get the DECserver counters easily ...
> please post them; they may not add anything to the picture,
> but they might.
Here are the results from one of the many DECServers.
Node LIMS is the VAX, all the others are PDP-11s running RSX.

Local> SHOW NODE ALL COUNTERS

Node: ALICE
Seconds Since Zeroed:      1985926
Messages Received:            1262
Messages Transmitted:         1133
Slots Received:                638
Slots Transmitted:             864
Bytes Received:              17554
Bytes Transmitted:             768

Multiple Node Addresses:            0
Duplicates Received:                0
Messages Re-transmitted:            6
Illegal Messages Received:          0
Illegal Slots Received:             0
Solicitations Accepted:             0
Solicitations Rejected:             0

Node: IRV70A
Seconds Since Zeroed:      2577531
Messages Received:               0
Messages Transmitted:            0
Slots Received:                  0
Slots Transmitted:               0
Bytes Received:                  0
Bytes Transmitted:               0

Multiple Node Addresses:            0
Duplicates Received:                0
Messages Re-transmitted:            0
Illegal Messages Received:          0
Illegal Slots Received:             0
Solicitations Accepted:             0
Solicitations Rejected:             0

Node: LIMS
Seconds Since Zeroed:      2577490
Messages Received:          122179
Messages Transmitted:        88449
Slots Received:              76864
Slots Transmitted:           65984
Bytes Received:            6861709
Bytes Transmitted:          494411

Multiple Node Addresses:            0
Duplicates Received:                6
Messages Re-transmitted:            1
Illegal Messages Received:          0
Illegal Slots Received:             0
Solicitations Accepted:             0
Solicitations Rejected:             0

Node: MINNIE
Seconds Since Zeroed:      2577505
Messages Received:           69814
Messages Transmitted:        66573
Slots Received:              13149
Slots Transmitted:           13931
Bytes Received:             779022
Bytes Transmitted:           15412

Multiple Node Addresses:            0
Duplicates Received:                0
Messages Re-transmitted:            0
Illegal Messages Received:          0
Illegal Slots Received:             0
Solicitations Accepted:             0
Solicitations Rejected:             0

> You have spares for all the active network kit, right?
Yes. We are planning to start by swapping out the DELNI in the Data
Center to see if this helps.

> Same goes for the MicroVAX itself. Do you have a spare network card (DELQA) you could plug in?
Yes. If the problem raises it's head again after swapping out the
DELNI then we will swap out the DELQA.
If it continues to fail after that, then we will swap out the H4000
transceiver.

> Has anybody installed any significant new electrical kit on the
> factory floor recently ?
No. The last physical change to the network was performed 2 months
before the first occurence of this VAX/LAT connection dropout problem,
and that change was just adding another DECServer 200 to another DELNI
in a remote IDF closet.

> How much is the disruption costing you? Enough to make it worthwhile upgrading the backbone to modern technology.
The VAX system is used exclusively for interactive sessions by many of
our Medial Labs for the entry of laboratory test results for QA/QC
records for the FDA. As long as the interruptions are not prolonged,
there are no serious impacts to production. The PDP-11s are
considerably involved in production. They must stay up for production
to continue but they do not use the network for production operations.
The network for the PDP-11s is simply for production monitors to see
what is going on and to pass data to the VAX. We can take the network
down for planned upgrades or changes for up to 2-3 hours without
significant impact to production.

> Forgot to say: wrt user sessions being dropped, you do know you
> can avoid that using VMS's "virtual terminal" feature, right? When the
> LAT session is dropped, the VMS session continues and can in principle
> be resumed from where it left off once the user reconnects their
> session.
I seem to remember this feature in VMS from along time ago, but for
some reason when the LAT connections are dropped, all the interactive
processes are stopped and the users are logged out.

> When the LAT service of the VMS machine had disappeared from the
> DECservers, did you still try to do connect to the service somehow (e.g.
> SET H /LAT <VAX>)? If you did, was it successful?
I just enabled outgoing LAT on the VAX, and I can do a SET HOST/LAT
and it works now, but the problem of all LAT connections dropping has
not happened again since I enabled outgoing LAT.

> You know what they say, put a monkey in front of a keyboard and
> eventually he'll come up with something intelligent.
We have had hundreds of Monkeys using this network for about 30 years.
So far no sign of intelligence.
==========================
Thanks again for all of your input.

Jeff Cameron



More information about the Info-vax mailing list