[Info-vax] Desirable features for VMS
Arne Vajhøj
arne at vajhoej.dk
Tue Jan 30 19:05:28 EST 2024
On 1/30/2024 5:20 AM, Marc Van Dyck wrote:
> Dave Froble wrote on 30/01/2024 :
>> Ok, a web server handling connection requests. Perhaps one or more
>> connections are disrupted before finishing. A re-start will begin to
>> again handle connection requests. Perhaps reasonable.
>>
>> Then, an example from one of my old customers:
>>
>> Orders were build interactively, and the data was stored in an
>> intermediate file. When done building, the intermediate file is then
>> queued to a poster that processes the data and performs updates to all
>> pertinent database files, then deletes the intermediate file.
>>
>> Ok, what happens when the system crashes during processing of an
>> order? Things are left incomplete and a nasty mess. Re-starting the
>> poster will make things worse. So, just restarting is not such a good
>> idea.
>>
>> In the example, best not to process the order that was interrupted.
>> Thankfully, this almost never happened. Thank you VMS and DEC
>> hardware and battery backup UPS. But, it was still a possibility.
>>
>> The partial solution was to build checkpoints into the design. At
>> each specific point in the poster, a flag was set, and forced to disk,
>> as each file update occurred. The poster was set up to respect the
>> checkpoint flags. Worked sort of well. Thee was still the
>> possibility the checkpoint flags weren't written to disk. I didn't
>> have an app that reviewed the information, and automatically re-queued
>> it telling the poster where to re-start. That was a tedious manual task.
>>
>> Hey, with most things, there is a point of diminishing returns on
>> efforts. Just not worth the cost.
>>
>> Please don't start ranting about a database with 2 stage commits.
>> Didn't have one.
>>
>> But my point is, just re-starting an application isn't always a solution.
>
> No, just restarting isn't the solution. And engineering the application
> to support random restarts isn't either. Just select a process from a
> system window, drag and drop it in another system window, and it
> continues to run on the other system as if nothing happened. That's what
> I'm after...
There are different models for HA:
A) application managed - the application store state somewhere where
a new instance can pick it up - this is not that hard to implement
but the application need to be written for it
B) system managed - the system store state somewhere where
a new instance can pick it up - this is hard to implement
but the application doesn't need to be written for it
A2) same as A with a feature where the system can move the application
from one node to another node - don't schedule the
processes/threads, copy memory content to other node, get various
files/network connections opened on the other node, schedule the
processes/threads on the new node, kill the instance on the
old node - harder than A but easier than B
Arne
More information about the Info-vax
mailing list