[Info-vax] Clock running very slow on an Alpha

Thu Aug 26 20:08:07 EDT 2010

"Peter Weaver" <info-vax at weaverconsulting.ca> wrote in message 
news:448c58d6-a2d9-4a41-b0fc-e1a34357d783 at 5g2000yqz.googlegroups.com...
>I have a customer with a four node cluster of Alphas running V7.2-1.
> Three of the Alphas are identical "AlphaServer 4100 5/533" machines.
> All four machines have been running Multinet  V4.2 XNTP for years. The
> XNTP configuration is identical on all four nodes, the SHOW LOG *TIME*
> command on all four nodes yields identical results. I downloaded and
> ran the TBO utility on the cluster, all three of the 4100's report;
>
> $ tbo/info
> %TBO-I-IDENT, OpenVMS Time Booster Rev 2.0
> %TBO-I-INFO,  Systemtime: 25-AUG-2010 13:00:48.89
>       Timeadjust: 0
>       Ticklength: 8333
>
> ANA/SYS returns the same values for these three locations on all three
> 4100's;
> EXE$GL_TIMEADJUST:  00000000.00000000   "........"
> EXE$GL_TICKLENGTH:  00000000.0000208D   ". ......"
> EXE$GL_SYSTICK:  00000000.0000208D   ". ......"
>
>
> One of the three 4100 machines servers recently started losing 15
> minutes per day. I have restarted XNTP on this node several times,
> each time the XNTPD.LOG reports that it acquired the peers (usually I
> have only one peer but during my troubleshooting I added in the other
> 3 Alphas as peers) but nothing else. The drift file on this node shows
> "0.000 0" I trield adjusting the time with TBO/DIRECTION=FORWARD/
> RANGE=7200/DELTA=4500 (move the clock ahead 75 minutes over the next 2
> hours) but even with that command the time on this Alpha was stil
> drifting away from the other Alphas just not as quickly.
>
> If I stop XNTP and run the ntpdate command then the clock jumps to the
> correct time.
>
> Can anyone think of anything from a VMS perspective that I missed that
> could be causing this? If it boils down to hardware then what part
> should I tell the hardware people I need replacing?

First, make sure that BOOT_RESET (IIRC the name) is set to be enabled.  That 
is, make sure the bus is fully reset when rebooting.

Time loss problems on these systems tends to be associated with bus errors 
and bus hogs.  The clock is updated by interrupt.  If you get into a 
situation where there is a high volume of recoverable bus errors, you will 
end up losing interrupts (that is, the latency between a clock interrupt 
will exceed a clock tick).  Network cards not being initialized fully (see 
above) have been implicated in this error.

Bus "hogs" are PIO devices which do high-speed blind writes to a PCI device 
(with no checks to see if the device is ready to accept the write).  What 
happens in these devices is that it responds with a retry NAK, and keeps 
responding until ready - and that blocks all other bus activity.  One device 
that does this is some graphics - especially the TGA2 based cards.

So - look at the behavior of the system having the problem that is different 
than the others.  Errors?  High graphics use?  BOOT_RESET setting?  Etc.