[Info-vax] hub better than switch?!

Sun Mar 4 07:33:25 EST 2012

On Mar 4, 7:14 am, hel... at astro.multiCLOTHESvax.de (Phillip Helbig---
undress to reply) wrote:
> A while back, I posted a question about a slowly (about 2 per day)
> increasing error count, mostly on PEA0: but occasionally on some disks
> connected to the same machine which shows the errors on PEA0:.  This
> stopped after my old switch died and I replaced it with a hub (actually
> this strange 10/100 Mb/s hybrid thingamajig).  I assumed that whatever
> caused the switch to fail was also responsible for these errors before
> the failure.
>
> Since I replaced this "hub" with a brand new 10/100/1000 MB/s proper
> switch a few days ago, the increasing error count is back.  :-(  It
> occurs on the one machine in the cluster which has a 100 MB/s ethernet
> card (the others have 10).  On the switch, this is running at 100 while
> on the "hub" it ran at 10 (but other things on the hub---an access point
> and another switch---ran at 100).

You're going to have fun with this, given that you have little
available information on exactly what is causing PEA0 to clock up
those errors, and on what traffic is going where in your unmanaged
switched network.

Do you know *when* the errors are occuring? Ie regularly or randomly?
(At two a day it's hard to tell either way?)

Is there any way you can (temporarily?) acquire a proper managed
switch with a port-mirroring capability?

A possibility which I've not yet seen mentioned and would be
interested in ruling in or out goes something like this: Switches
build themselves a table of MAC addresses which they use to determine
which port should be the destination for incoming traffic (multicast
and broadcast addresses should go everywhere). This table is not
infinite size (it varies from switch to switch) and a mechanism is
therefore needed to throw away "now unused" addresses when a
previously unseen address shows up.

You may think you've only got a handful of boxes and your switch
probably has lots of MAC addresses in comparison (you might want to
see if there is a spec somewhere for how many it can cope with), but
then you need to bear in mind that various DEC protocols often create
their own MAC addresses on the fly.

If you had access to a managed switch (or temporarily reverted to your
hub) it would be relatively simple to use something like Wireshark (or
for dinosaurs, you might even be able to use DEC's own Ethernim on VAX/
VMS on any VAX with a decent NIC, or (even less likely) DEC's
LANBridge Monitor tools, if you happen to have an appropriate bridge),
to check how many distinct MAC addresses are actually in use over
time.

Actually any half decent manageable bridge/switch with an SNMP
capability (supporting the RMON MIB, as per RFC3577?) together with
any half decent SNMP tool (does VMS have any of those these days?)
ought to be able to help identify how many MAC addresses are active on
your network.

On the other hand: assuming there are no basic configuration errors
such as mismatched duplexness (*please* recheck this, as already
suggested by others), my guess, based on nothing more than your
description and my intuition, is that one or more PEA0/SCS packets
sent at 100Mbit is being dropped occasionally by the switch, not
because of address table overflow described above, but because of
occasional port buffer overflow (too much 100Mbit traffic being sent
to 10Mbit ports, filling some buffer in the switch for some reason).

PEA0/SCS wants acknowledgements to these things and if sufficient
acknowledgements are missed errors will be logged. Some of these
cluster messages are sent as multicast, which will presumably be first
to be dropped at times of buffer space shortage. And if there is
buffer space but the packet is buffered for sufficiently long that SCS
decides there's no timely acknowledgement, the same result occurs at
cluster level. Not sure what happens if an acknowledgement turns up
outside the expected timeframe (having been unexpectedly held in
somebody's buffer); maybe that would also be seen as an error.

Again, a proper managed switch might have counters for this and other
kinds of packets which the switch has dropped. You might need PEA0 or
SCS internals knowledge (more than I have) to understand exactly
what's causing PEA0

You do realise that an occasional PEA0 error isn't necessarily serious
cause for concern, right?

Comments, corrections and clarifications always welcome.

Good luck.