[Info-vax] Desirable features for VMS

Tue Jan 30 19:05:28 EST 2024

On 1/30/2024 5:20 AM, Marc Van Dyck wrote:
> Dave Froble wrote on 30/01/2024 :
>> Ok, a web server handling connection requests.  Perhaps one or more 
>> connections are disrupted before finishing.  A re-start will begin to 
>> again handle connection requests.  Perhaps reasonable.
>>
>> Then, an example from one of my old customers:
>>
>> Orders were build interactively, and the data was stored in an 
>> intermediate file.  When done building, the intermediate file is then 
>> queued to a poster that processes the data and performs updates to all 
>> pertinent database files, then deletes the intermediate file.
>>
>> Ok, what happens when the system crashes during processing of an 
>> order? Things are left incomplete and a nasty mess.  Re-starting the 
>> poster will make things worse.  So, just restarting is not such a good 
>> idea.
>>
>> In the example, best not to process the order that was interrupted. 
>> Thankfully, this almost never happened.  Thank you VMS and DEC 
>> hardware and battery backup UPS.  But, it was still a possibility.
>>
>> The partial solution was to build checkpoints into the design.  At 
>> each specific point in the poster, a flag was set, and forced to disk, 
>> as each file update occurred.  The poster was set up to respect the 
>> checkpoint flags.  Worked sort of well.  Thee was still the 
>> possibility the checkpoint flags weren't written to disk.  I didn't 
>> have an app that reviewed the information, and automatically re-queued 
>> it telling the poster where to re-start.  That was a tedious manual task.
>>
>> Hey, with most things, there is a point of diminishing returns on 
>> efforts. Just not worth the cost.
>>
>> Please don't start ranting about a database with 2 stage commits.  
>> Didn't have one.
>>
>> But my point is, just re-starting an application isn't always a solution.
> 
> No, just restarting isn't the solution. And engineering the application
> to support random restarts isn't either. Just select a process from a
> system window, drag and drop it in another system window, and it
> continues to run on the other system as if nothing happened. That's what
> I'm after...

There are different models for HA:
A) application managed - the application store state somewhere where
    a new instance can pick it up - this is not that hard to implement
    but the application need to be written for it
B) system managed - the system  store state somewhere where
    a new instance can pick it up - this is hard to implement
    but the application doesn't need to be written for it
A2) same as A with a feature where the system can move the application
     from one node to another node - don't schedule the
     processes/threads, copy memory content to other node, get various
     files/network connections opened on the other node, schedule the
     processes/threads on the new node, kill the instance on the
     old node - harder than A but easier than B

Arne