[Info-vax] Are queue manager updates written to disk immediately ?

Simon Clubley clubley at remove_me.eisner.decus.org-Earth.UFP
Thu Apr 11 11:27:06 EDT 2013


When a batch job completes, does the queue manager _immediately_ write
that information to it's data structures on disk or is that information
cached in memory for a short time first ?

I had a problem this morning which I have not seen before and it looks
like VMS is not _immediately_ writing queue updates away to disk, which
to put it mildly is bl**dy dangerous if true.

A routine batch job ran at 08:20 and completed; I have the log file
from this first run on disk.

The power failed about a minute or so later. (This box is not UPS
protected.)

When the system restarted, the same job ran again. I also have the log
file from this second run on disk.

The system disk (and all disks) are in write-through mode:

  Volume Status:  ODS-2, subject to mount verification, protected subsystems
      enabled, file high-water marking, write-through caching enabled.

These are directly attached disks on a PCI controller and it's a Alpha
V8.3 system.

>From the end of the first run:

$       submit/queue=ba0/after:"tomorrow+08:20" ownsrc:check_queues.com
Job CHECK_QUEUES (queue BA0, entry 3927) holding until 12-APR-2013 08:20
$       exit
  [deleted]       job terminated at 11-APR-2013 08:20:02.57

  Accounting information:
  Buffered I/O count:                204      Peak working set size:       5088
  Direct I/O count:                  229      Peak virtual size:         173040
  Page faults:                      1455      Mounted volumes:                0
  Charged CPU time:        0 00:00:00.52      Elapsed time:       0 00:00:02.56

As the job had completed, the entry should have disappeared from the
queue database at that point. Instead after the system restart, the
same job ran again. Here is the end of the second run:

$       submit/queue=ba0/after:"tomorrow+08:20" ownsrc:check_queues.com
Job CHECK_QUEUES (queue BA0, entry 3923) holding until 12-APR-2013 08:20
$       exit
  [deleted]       job terminated at 11-APR-2013 08:25:11.69

  Accounting information:
  Buffered I/O count:                204      Peak working set size:       5680
  Direct I/O count:                  253      Peak virtual size:         173040
  Page faults:                      1387      Mounted volumes:                0
  Charged CPU time:        0 00:00:00.67      Elapsed time:       0 00:00:05.25

Also, there was no entry in the queue database for the resubmitted job
(entry 3927) from the first run, but the resubmitted job from the second
run (entry 3923) is in the queue database as expected.

Does anyone have any ideas ?

Thanks,

Simon.

PS: Before someone asks, I will fire this off to VMS support in a day or
so, but I just wanted to do a quick check here to see if anyone here had
seen this first.

-- 
Simon Clubley, clubley at remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world



More information about the Info-vax mailing list