[Info-vax] Are queue manager updates written to disk immediately ?
Jan-Erik Soderholm
jan-erik.soderholm at telia.com
Thu Apr 11 12:09:22 EDT 2013
Simon Clubley wrote 2013-04-11 17:27:
> When a batch job completes, does the queue manager _immediately_ write
> that information to it's data structures on disk or is that information
> cached in memory for a short time first ?
>
> I had a problem this morning which I have not seen before and it looks
> like VMS is not _immediately_ writing queue updates away to disk, which
> to put it mildly is bl**dy dangerous if true.
>
> A routine batch job ran at 08:20 and completed; I have the log file
> from this first run on disk.
>
> The power failed about a minute or so later. (This box is not UPS
> protected.)
>
> When the system restarted, the same job ran again. I also have the log
> file from this second run on disk.
>
> The system disk (and all disks) are in write-through mode:
>
> Volume Status: ODS-2, subject to mount verification, protected subsystems
> enabled, file high-water marking, write-through caching enabled.
>
> These are directly attached disks on a PCI controller and it's a Alpha
> V8.3 system.
>
> From the end of the first run:
>
> $ submit/queue=ba0/after:"tomorrow+08:20" ownsrc:check_queues.com
> Job CHECK_QUEUES (queue BA0, entry 3927) holding until 12-APR-2013 08:20
> $ exit
> [deleted] job terminated at 11-APR-2013 08:20:02.57
>
> Accounting information:
> Buffered I/O count: 204 Peak working set size: 5088
> Direct I/O count: 229 Peak virtual size: 173040
> Page faults: 1455 Mounted volumes: 0
> Charged CPU time: 0 00:00:00.52 Elapsed time: 0 00:00:02.56
>
> As the job had completed, the entry should have disappeared from the
> queue database at that point. Instead after the system restart, the
> same job ran again. Here is the end of the second run:
>
> $ submit/queue=ba0/after:"tomorrow+08:20" ownsrc:check_queues.com
> Job CHECK_QUEUES (queue BA0, entry 3923) holding until 12-APR-2013 08:20
> $ exit
> [deleted] job terminated at 11-APR-2013 08:25:11.69
>
> Accounting information:
> Buffered I/O count: 204 Peak working set size: 5680
> Direct I/O count: 253 Peak virtual size: 173040
> Page faults: 1387 Mounted volumes: 0
> Charged CPU time: 0 00:00:00.67 Elapsed time: 0 00:00:05.25
>
> Also, there was no entry in the queue database for the resubmitted job
> (entry 3927) from the first run, but the resubmitted job from the second
> run (entry 3923) is in the queue database as expected.
>
> Does anyone have any ideas ?
>
> Thanks,
>
> Simon.
>
> PS: Before someone asks, I will fire this off to VMS support in a day or
> so, but I just wanted to do a quick check here to see if anyone here had
> seen this first.
>
I guess accounting (acc/queue=ba0) also shows both jobs as runed?
Nothing weird with the start/finish timestamps in accounting?
I would look mare at something with the system clock at
startup that made the holding job to be released. Such
as a startup with wrong time setting or similar.
That is, queue ba0 is started by the startup before the
clock is corrected. Or something like that.
Jan-Erik.
More information about the Info-vax
mailing list