[Info-vax] queue errors

Thu Oct 18 12:19:58 EDT 2012

On Oct 17, 10:44 am, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
wrote:
> On 2012-10-17 12:58:57 +0000, Tom Adams said:
>
> > The files are log files from a system that don't make it to the
> > logger.  Not printed by an end-user.
>
> You're probably not going to go for it, but syslog or syslogng or
> analogous would be more typical in recent application designs and
> updates; centralized logging.
>
> No, syslog and syslogng are not part of TCP/IP Services (and probably
> never will be), but open-source versions of the necessary clients are
> available.
>
> > But we do have it all in a disk log that loses nothing.  So the losses
> > are not real important, but they have gotten on a todo list prepared
> > by QA.
>
> > I am looking into this because the QA people tend to focus on the
> > paper log. A bit hidebound and not totally rational I know.
>
> I might wonder anout the organizational focus and funding here.  That
> this "minor issue" of lost files is near the top of the to-do list, yet
> efforts such as ensuring that comparatively stringent compiler
> diagnostics (/CHECK, et al), better and centralized logging,
> diagnostics from failing processes ("because of legacy code issues and
> the fact it might crash some detached processes for unimportant
> reasons"), and the maintenance of current VMS versions and patches,
> aren't.
>
> Legacy code and down-revision versions can be and often are excellent
> sources of latent and subtle bugs.
>
> In aggregate, this can be inferred to mean the organization's QA
> efforts might benefit from a somewhat broader view; toward the options
> and alternatives and trade-offs that are available, and where the bugs
> are located in the current application code base.
>
> I'd wonder what's getting missed, too.  A lack of encryption on
> critical data is a fairly common omission in these legacy-code
> environments, for instance.
>
> > If I had the files, I could tell exactly what is missing from the
> > paper log with little effort.
>
> Try the ANALYZE /DISK /REPAIR command, and then go look in [SYSLOST]
>
> And look at the code that's generating this logging data; that code
> contains either design bugs, or run-time error-handling bugs, or
> (likely) both.  It's assuming that the printers always work, and that's
> never been the case.  (And this also suggests use of capabilities such
> as syslog / syslogng or some other form of centralized logging.)
>
> > As things stand, we go to the disk log if someone notices that
> > something seems to be missing from the paper log.  That has only
> > happened once in more than a decade.
>
> One case that you know of.
>
> DECserver and network devices and printers can all get flaky.
>
> Old computers and old code can sometimes be wonderfully pernicious, too.
>
> And FWIW, design bugs and error-handling bugs can be latent from the
> first line of code ever included into an application; for "more than a
> decade".  I've encountered day-one bugs in central parts of VMS, and
> there have been documented cases of bugs latent from twenty and thirty
> years and more.
>
> More succinctly, "runs" doesn't mean "correct", nor "bug free".
>
> --
> Pure Personal Opinion | HoffmanLabs LLC

Prompted by your post, I studied and tested the /CHECK qualifier we
use.  The only difference between it and /CHECK=ALL is that we turn
off runtime underflow checking. How is underflow checking a software
quality issue, since one always has to guard against or divide by zero
and other places were zero is a problem in calculations? (I am not
asking a rhetorical question, I really would like to know if it is a
quality issue.)

We stopped paying for VMS support a long time ago and that froze us at
7.3-2 a long time ago.  Don't see any way around that.

Do you think it's a good idea to always patch VMS ASAP (as soon as the
patch is available); or only when the patch is recommended for all or
might fix a known problem?  What's the best policy for this?

I am pretty sure I need to improve my policy on patching.

The application logging is centralized in one application process and
(as I said) we log to a shadowed disk and have never lost message.
Maybe 3 messages per year don't make it to the paper logger due to
these queue errors, and more due to paper jams (but we improved the
jam rate recently - just turning the paper forward a bit  once a day
is a good functional check of the feed, better than just eyeballing
the perforations.)