[Info-vax] Are queue manager updates written to disk immediately ?

Fri Apr 12 10:42:11 EDT 2013

On 2013-04-12 14:03:58 +0000, Simon Clubley said:

> On 2013-04-12, Stephen Hoffman <seaohveh at hoffmanlabs.invalid> wrote:
>> On 2013-04-11 15:27:06 +0000, Simon Clubley said:
>> 
>> If this queue manager misbehavior is a sufficient issue for you,
>> consider getting yourself a Less-Interruptible Power Supply (LIPS, as
>> I've never met a truly uninterruptible power supply) for the system.
> 
> Thanks for the feedback, Hoff.
> 
> The problem with that is that it feels like a hardware workaround for a 
> software bug.

Um, so?  If throwing hardware at the problem reduces or avoids the 
problem, it's goodness.

I'm all for not testing the handling and recovery of others' software 
during hard-crashes, too.

> The normal application level production jobs (this was not one of them)
> are part of a site specific scheduler which means that when they run is
> under that scheduler's control (job specific .com files are created and
> submitted by the scheduler as required).

Time to move this job into the production scheduler?

> This design also means there are no holding jobs waiting to be released
> manually by mistake when they should not be; the scheduler in use was
> designed that way on purpose to stop just this problem of the job been
> run when it should not be. What it will not currently protect against
> however is VMS itself running the same submitted job twice.

I'm not fond of the VMS APIs here.  They're far too primitive, and far 
too limited.

The queue manager and the operator subsystem and some other core 
functions are positively ancient designs.  They weren't even 
particularly advanced when they were designed and written, either.  
TOPS was doing better in a number of areas.  IMO, an overhaul is long 
overdue.  (Which ties back to my comments in another recent thread 
around whether seeking to emulate VMS is really such a good idea.  But 
I digress.)

> In case it's not obvious by now :-), I tend to be rather paranoid when
> it comes to data integrity and security and even I did not think about
> the possibility of VMS itself doing something like this (if indeed that
> turns out to be the case).

You should see what a storage controller once did to a database I was managing.

> All hardware is official HP supported hardware; no unsupported third
> party equipment for either the controller or disks.
> 
> Everything is configured as write-through; no deferred writes involved.

Write-through, write-back and deferred writes are not related to what I 
was referencing in my reply.  With full-on fully-synchronous $qiow or 
$io_performw I/O calls followed by an explicit synchronize command — a 
command which VMS seldom uses, AFAIK — a controller or a disk that 
caches a synchronize-cache command could show the data-loss behavior 
mentioned.  If you're not synchronizing the I/O with a caching 
controler or a caching disk — with a controller and disks that do 
correctly implement the synchronization request — then the results of a 
power failure or hard crash here could also lose data.

But I don't recall the queue manager I/O off-hand.  Maybe queue manager 
isn't coded to deal with a power outage...

-- 
Pure Personal Opinion | HoffmanLabs LLC