[Info-vax] Are queue manager updates written to disk immediately ?

Jan-Erik Soderholm jan-erik.soderholm at telia.com
Thu Apr 11 12:09:22 EDT 2013


Simon Clubley wrote 2013-04-11 17:27:
> When a batch job completes, does the queue manager _immediately_ write
> that information to it's data structures on disk or is that information
> cached in memory for a short time first ?
>
> I had a problem this morning which I have not seen before and it looks
> like VMS is not _immediately_ writing queue updates away to disk, which
> to put it mildly is bl**dy dangerous if true.
>
> A routine batch job ran at 08:20 and completed; I have the log file
> from this first run on disk.
>
> The power failed about a minute or so later. (This box is not UPS
> protected.)
>
> When the system restarted, the same job ran again. I also have the log
> file from this second run on disk.
>
> The system disk (and all disks) are in write-through mode:
>
>    Volume Status:  ODS-2, subject to mount verification, protected subsystems
>        enabled, file high-water marking, write-through caching enabled.
>
> These are directly attached disks on a PCI controller and it's a Alpha
> V8.3 system.
>
>  From the end of the first run:
>
> $       submit/queue=ba0/after:"tomorrow+08:20" ownsrc:check_queues.com
> Job CHECK_QUEUES (queue BA0, entry 3927) holding until 12-APR-2013 08:20
> $       exit
>    [deleted]       job terminated at 11-APR-2013 08:20:02.57
>
>    Accounting information:
>    Buffered I/O count:                204      Peak working set size:       5088
>    Direct I/O count:                  229      Peak virtual size:         173040
>    Page faults:                      1455      Mounted volumes:                0
>    Charged CPU time:        0 00:00:00.52      Elapsed time:       0 00:00:02.56
>
> As the job had completed, the entry should have disappeared from the
> queue database at that point. Instead after the system restart, the
> same job ran again. Here is the end of the second run:
>
> $       submit/queue=ba0/after:"tomorrow+08:20" ownsrc:check_queues.com
> Job CHECK_QUEUES (queue BA0, entry 3923) holding until 12-APR-2013 08:20
> $       exit
>    [deleted]       job terminated at 11-APR-2013 08:25:11.69
>
>    Accounting information:
>    Buffered I/O count:                204      Peak working set size:       5680
>    Direct I/O count:                  253      Peak virtual size:         173040
>    Page faults:                      1387      Mounted volumes:                0
>    Charged CPU time:        0 00:00:00.67      Elapsed time:       0 00:00:05.25
>
> Also, there was no entry in the queue database for the resubmitted job
> (entry 3927) from the first run, but the resubmitted job from the second
> run (entry 3923) is in the queue database as expected.
>
> Does anyone have any ideas ?
>
> Thanks,
>
> Simon.
>
> PS: Before someone asks, I will fire this off to VMS support in a day or
> so, but I just wanted to do a quick check here to see if anyone here had
> seen this first.
>

I guess accounting (acc/queue=ba0) also shows both jobs as runed?
Nothing weird with the start/finish timestamps in accounting?

I would look mare at something with the system clock at
startup that made the holding job to be released. Such
as a startup with wrong time setting or similar.

That is, queue ba0 is started by the startup before the
clock is corrected. Or something like that.

Jan-Erik.






More information about the Info-vax mailing list