[Info-vax] Desirable features for VMS

Tue Jan 30 23:25:20 EST 2024

On 1/30/2024 7:05 PM, Arne Vajhøj wrote:
> On 1/30/2024 5:20 AM, Marc Van Dyck wrote:
>> Dave Froble wrote on 30/01/2024 :
>>> Ok, a web server handling connection requests.  Perhaps one or more
>>> connections are disrupted before finishing.  A re-start will begin to again
>>> handle connection requests.  Perhaps reasonable.
>>>
>>> Then, an example from one of my old customers:
>>>
>>> Orders were build interactively, and the data was stored in an intermediate
>>> file.  When done building, the intermediate file is then queued to a poster
>>> that processes the data and performs updates to all pertinent database files,
>>> then deletes the intermediate file.
>>>
>>> Ok, what happens when the system crashes during processing of an order?
>>> Things are left incomplete and a nasty mess.  Re-starting the poster will
>>> make things worse.  So, just restarting is not such a good idea.
>>>
>>> In the example, best not to process the order that was interrupted.
>>> Thankfully, this almost never happened.  Thank you VMS and DEC hardware and
>>> battery backup UPS.  But, it was still a possibility.
>>>
>>> The partial solution was to build checkpoints into the design.  At each
>>> specific point in the poster, a flag was set, and forced to disk, as each
>>> file update occurred.  The poster was set up to respect the checkpoint flags.
>>>  Worked sort of well.  Thee was still the possibility the checkpoint flags
>>> weren't written to disk.  I didn't have an app that reviewed the information,
>>> and automatically re-queued it telling the poster where to re-start.  That
>>> was a tedious manual task.
>>>
>>> Hey, with most things, there is a point of diminishing returns on efforts.
>>> Just not worth the cost.
>>>
>>> Please don't start ranting about a database with 2 stage commits.  Didn't
>>> have one.
>>>
>>> But my point is, just re-starting an application isn't always a solution.
>>
>> No, just restarting isn't the solution. And engineering the application
>> to support random restarts isn't either. Just select a process from a
>> system window, drag and drop it in another system window, and it
>> continues to run on the other system as if nothing happened. That's what
>> I'm after...
>
> There are different models for HA:
> A) application managed - the application store state somewhere where
>    a new instance can pick it up - this is not that hard to implement
>    but the application need to be written for it
> B) system managed - the system  store state somewhere where
>    a new instance can pick it up - this is hard to implement
>    but the application doesn't need to be written for it
> A2) same as A with a feature where the system can move the application
>     from one node to another node - don't schedule the
>     processes/threads, copy memory content to other node, get various
>     files/network connections opened on the other node, schedule the
>     processes/threads on the new node, kill the instance on the
>     old node - harder than A but easier than B

"B" is what I had in mind in my earlier post.

But, what will the customers pay for?

In the example I posted earlier, I didn't get paid for the checkpointing work. 
Customer didn't care about little glitches.  It offended my sensibilities 
concerning "right and wrong".  After I thought about the solution, I just 
implemented it, for my own satisfaction.  I felt much better.

:-)

-- 
David Froble                       Tel: 724-529-0450
Dave Froble Enterprises, Inc.      E-Mail: davef at tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA  15486