[Info-vax] Are queue manager updates written to disk immediately ?

Paul Sture nospam at sture.ch
Thu Apr 11 14:00:42 EDT 2013


In article <kk6s8q$39e$1 at dont-email.me>,
 Simon Clubley <clubley at remove_me.eisner.decus.org-Earth.UFP> wrote:

> On 2013-04-11, Jan-Erik Soderholm <jan-erik.soderholm at telia.com> wrote:
> > Simon Clubley wrote 2013-04-11 18:28:
> >> On 2013-04-11, Jan-Erik Soderholm <jan-erik.soderholm at telia.com> wrote:
> >>>
> >>> I guess accounting (acc/queue=ba0) also shows both jobs as runed?
> >>
> >> There's no accounting entry for the first job even though there is
> >> a full logfile, but since accounting log updates are buffered, then
> >> that's not really a major surprise.
> >>
> >>> Nothing weird with the start/finish timestamps in accounting?
> >>>
> >>> I would look mare at something with the system clock at
> >>> startup that made the holding job to be released. Such
> >>> as a startup with wrong time setting or similar.
> >>>
> >>
> >> All the timestamps on the log files and accounting are correct; the only
> >> time the system clock is corrected on this system is once a day during
> >> during the night from a NTP source and that job had already run a couple
> >> of hours previously.
> >
> > But there was a reboot in between, not ?
> >
> 
> Sequence:
> 
> NTP job (06:30) -> check_queues (08:20) -> power failure ->
> 	check_queues run again (08:25).
> 
> >> In addition, the queue entry number from the accounting record for the
> >> second job didn't match the entry number from the submit command in
> >> the first job.
> >
> > Higher/later or lower/earlier?
> >
> > Does it match the entry number from *any* previous job ?
> > I don't know ho long you keep logs, of course... :-)
> >
> 
> I also have the log file from yesterday's run (10-Apr-2013 08:20).
> 
> The entry number from yesterday's submit command matches the entry
> number in the accounting log for the second run today.
> 
> So the queue manager has indeed run the job submitted yesterday twice
> to completion today.
> 
> > Sounds like the quemgr thought that the job hadn't been
> > run (or hadn't completed) and simply restarted it.
> >
> > But then, will a restarted batch job not run using the
> > original entry number? Maybe not...
> >
> > I would expect the queue database to be updated syncronisly
> > at the time of batch job "rundown". That is where SHOW QUEUE
> > looks, not?
> >
> 
> That's _exactly_ what I would expect as well, and on disk as well;
> not just in memory.
> 
> Even if there was some queue manager bug caused by a power failure
> during some unusual tight timing window of a few milliseconds [*],
> that still does not explain the disappearing job from the submit
> command in the first job.
> 
> Simon.
> 
> [*] A few milliseconds maximum, because don't forget I have a _full_
> logfile from the first run of the job today.

That sounds like a small window in the batch run down code.  If you 
think about it there must be a point between the batch job completing 
and its return status being processed so that the queue manager either 
deleting or retaining the job in the queue database.

-- 
Paul Sture



More information about the Info-vax mailing list