[Info-vax] Sudden problems with slow sftp transfers and slow disk accesses
johnwallace4 at yahoo.co.uk
johnwallace4 at yahoo.co.uk
Tue Feb 18 13:32:15 EST 2014
On Tuesday, 18 February 2014 13:25:38 UTC, gwil... at cfa.harvard.edu wrote:
> On Tuesday, February 18, 2014 8:16:17 AM UTC-5, Jan-Erik Soderholm wrote:
>
> > gwi... at cfa.harvard.edu wrote 2014-02-18 13:53:
>
> >
>
> > > On Tuesday, February 18, 2014 6:23:22 AM UTC-5, Jim wrote:
>
> >
>
> > >> On Monday, February 17, 2014 11:42:27 PM UTC-5, gwil... at cfa.harvard.edu wrote:
>
> >
>
> > >>
>
> >
>
> > >>> Cluster of 8 Alpha boxes running V8.3 + patches, recently moved
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> from one building to another. Since the move, we've been experiencing
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> odd behaviors: very slow network access (via sftp) and slow disk IO.
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> Disk storage is mostly on three eternal disk boxes (five three-member
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> shadow sets). No errors are reported via SHOW DEV DSA.
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> Network cards on all machines are set to 100 MB, full duplex,
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> non-autonegotiate, connected via an 8-port GB switch.
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> An sftp from one of our machine to a local Linux system transferred 288 KB
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> of a ~ 6MB file in the first second, the current rate is now down to 5KB/s.
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> MONITOR PROCESS/TOPCPU doesn't show the process getting even 1% of the
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> CPU and there is nothing else running on the system. Another attempt on
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> the same file transferred 1.5 MB in the first second, then dropped to
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> < 50 KB/s.
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> SHOW MEMORY doesn't show any problems.
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> A filing operation merging two large files took a matter of seconds
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> when both files were on a locally-attached disk. When both files were
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> on a shadow set, the merge took 6+ minutes. MONITOR LOCK while
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> running the latter test showed ENQ/DEQ rates < 1.
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> Image activation is slow. It can take several seconds to begin
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> running an .exe stored on the shadow set.
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> MCR SCACP SHOW LAN_DEV/ALL showed numerous errors occurring over the
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> past 24 hours, so this evening we replaced the switch connecting these
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> 8 machines. Errors are continuing to appear.
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> What am I missing?
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>>
>
> >
>
> > >>
>
> >
>
> > >>> Gareth
>
> >
>
> > >>
>
> >
>
> > >>
>
> >
>
> > >>
>
> >
>
> > >> The output of the following command might be interesting
>
> >
>
> > >>
>
> >
>
> > >>
>
> >
>
> > >>
>
> >
>
> > >> $ MCR LANCP SHOW DEVICE/INTERNAL Exxx ! where Exxx is the suspect NIC
>
> >
>
> > >
>
> >
>
> > > Contrary to what DO MCR LANCP SHOW DEV EW/CHAR shows in SYSMAN
>
> >
>
> > > (all interface cards set to Full duplex enable YES, Full duplex
>
> >
>
> > > operational YES, 100 Mb/s), most of the interfaces carrying IP
>
> >
>
> > > traffic display "possible duplex mismatch" when running the SHOW
>
> >
>
> > > DEV/INT command. The driver messages all show "Link State: UP"
>
> >
>
> > > and "Full Duplex 100base TX connection selected". I have reconfigured
>
> >
>
> > > all interfaces, but I'm still seeing sftp issues (the 6 MB file
>
> >
>
> > > mentioned earlier transferred 2.8 MB in the first second, then
>
> >
>
> > > trailed off). The next driver message isn't due for another 30 minutes
>
> >
>
> > > or so.
>
> >
>
> > >
>
> >
>
> > > Gareth
>
> >
>
> > >
>
> >
>
> >
>
> >
>
> > Do you not read all replies?
>
> >
>
> >
>
> >
>
> > This is probably *NOT* an issue with your Alpha servers or
>
> >
>
> > with OpenVMS. CHECK YOUR SWITCHES SETTINGS!
>
> >
>
> >
>
> >
>
> > "possible duplex mismatch" is not that hard to understand.
>
> >
>
> >
>
> >
>
> > Your problems with (s)ftp is exactly what is expected when
>
> >
>
> > the switch runs half-duplex. Terminal sessions runs OK, ftp
>
> >
>
> > of small files works but larger file transfers "hangs".
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Jan-Erik.
>
>
>
> The switch is an auto-sensing Netgear 16-port FS116. I don't see how
>
> I can check what the switch is set to.
"The switch is an auto-sensing Netgear 16-port FS116. I don't see how
I can check what the switch is set to. "
On that Netgear FS1xx family (100Mbit max), there is nothing you can
check, other than the LEDs (and they may or may not be trustworthy here).
The whole family is utterly unmanaged, 100Mbit max, no GBe. I used to have FS105 and FS108 at home (little brothers of FS116). Time to move on.
Also, FS1xx are far from new, and as Colin and others have mentioned,
some older kit doesn't work real well when asked to auto-negotiate. It's
not necessarily anybody's fault, the standards just didn't exist.
Get a decent managed switch. It's likely cost more in wasted time for
you and your colleagues than it would have cost to buy a decent switch
in the first place. I do realise this concept is sometimes a hard sell
in the post-Dilbert era.
If you have a local Netgear fan club, please ignore the ProSafe Plus
(GS108e etc) range. What limited management they do have is based on
a Windows-only application (not SNMP, not Web), and anyway you can't
force line speed. (I have one at home, wouldn't consider it for
serious use).
I'll leave recommendations to others. In the absence of budget, the
idea of reusing an otherwise unused 100Mbit managed switch has many
attractions, if all your Alphas are older ones.
Best of luck. Aren't networks fun when they don't quite work?
More information about the Info-vax
mailing list