[Info-vax] Are queue manager updates written to disk immediately ?

Jan-Erik Soderholm jan-erik.soderholm at telia.com
Fri Apr 12 10:27:54 EDT 2013


Simon Clubley wrote 2013-04-12 16:03:
> On 2013-04-12, Stephen Hoffman <seaohveh at hoffmanlabs.invalid> wrote:
>> On 2013-04-11 15:27:06 +0000, Simon Clubley said:
>>
>> If this queue manager misbehavior is a sufficient issue for you,
>> consider getting yourself a Less-Interruptible Power Supply (LIPS, as
>> I've never met a truly uninterruptible power supply) for the system.
>
> Thanks for the feedback, Hoff.
>
> The problem with that is that it feels like a hardware workaround for a
> software bug.
>
>> And as others have mentioned, add some checks against a job that really
>> can't run twice.
>
> The problem with ad-hoc checks is just as you mention, that it _is_ ad-hoc.
>
> The normal application level production jobs (this was not one of them)
> are part of a site specific scheduler which means that when they run is
> under that scheduler's control (job specific .com files are created and
> submitted by the scheduler as required).
>
> This design also means there are no holding jobs waiting to be released
> manually by mistake when they should not be; the scheduler in use was
> designed that way on purpose to stop just this problem of the job been
> run when it should not be. What it will not currently protect against
> however is VMS itself running the same submitted job twice.
>
> In case it's not obvious by now :-), I tend to be rather paranoid when
> it comes to data integrity and security and even I did not think about
> the possibility of VMS itself doing something like this (if indeed that
> turns out to be the case).
>
>>   (I've seen a few of these cases in clusters, when the
>> cluster time was skewed among hosts.  Your "tomorrow+08:20" should have
>> avoided problems from the usual minor skews, unless the time in the
>> cluster ? on the host that was running the queue manager, which is not
>> necessarily the host that was running the batch job ? was very skewed.)
>>   I've ended up with a batch scheduler for these and related tasks.
>>
>
> It's a standalone system; no cluster involved.
>
> All hardware is official HP supported hardware; no unsupported third
> party equipment for either the controller or disks.
>
> Everything is configured as write-through; no deferred writes involved.
>
> BTW, it also occurred to me after my last batch of responses that if
> a window exists during job rundown when the queue manager thinks the
> job is still active even though the log file is complete, then the job
> should have been marked with the system failed during execution status
> you would normally get in that situation upon system restart.
>
> That makes me think nothing about the job actually starting was written
> to the queue manager database on disk even though a full logfile was
> written to those same disks. (The logfile was on a different disk, but
> that disk was attached to the same controller.)
>
> I've now logged the issue with HP and they are currently looking at it.
>
> Thanks everyone,
>
> Simon.
>

And it wasn't as simple as an extra submit of the job
from the startup scripts? Do you have any log files
with the startup/console output ?

Jan-Erik.





More information about the Info-vax mailing list