[Info-vax] Ksplice equivalent for VMS ?

Wed Feb 19 21:58:04 EST 2025

On 2/19/2025 9:26 PM, Dan Cross wrote:
> In article <67b66f14$0$712$14726298 at news.sunsite.dk>,
> Arne Vajhøj  <arne at vajhoej.dk> wrote:
>> On 2/19/2025 5:26 PM, Dan Cross wrote:
>>> In article <vp5dig$2dnk8$1 at dont-email.me>,
>>> Arne Vajhøj  <arne at vajhoej.dk> wrote:
>>>> I am thinking about a scenario like:
>>>> * cluster with node A and B
>>>> * critical process P that for whatever reason does not work
>>>>     running concurrent on multiple nodes runs on A
>>>> * node A needs to be taken down for some reason
>>>> * so VMS on node A and B does some magic and migrate P from A to B
>>>>     transparent to users (obviously require a cluster IP address or
>>>>     load balancer)
>>>
>>> While this may be an acceptable method to "hotpatch" a host with
>>> minimal disruption to whatever workload it's running, it is
>>> completely unlike what ksplice does.  For one, it requires that
>>> sufficient resources exist in wherever you'd migrate the process
>>> to for the duration of the update.
>>
>> That is a requirement.
>>
>> :-)
>>
>>>                                     Moreover, it requires that
>>> all aspects of state that are required to resume execution of
>>> the process are accessable and replicable on other, similar
>>> hardware.
>>
>> Yes. Which becomes a little easier when restricted to a
>> cluster instead of any systems.
> 
> I don't know what you mean when you say, "restricted to a
> cluster instead of any systems."

A and B being in a cluster instead of being two
standalone nodes.

>                               If you mean that this somehow
> makes managing state during process migration easier, then no,
> not really; all of the same caveats apply.  For instance,
> if a program is using (say) a GPU for computation, part of
> migrating it will be extracting whatever state it has in the
> GPU out of the GPU, and replicating it on the destination
> system.
> 
> At one point, the internal numbering of cores in the GPU was
> visible to code running on the GPU, creating an $n \choose k$
> fingerprinting problem for migration.

A VMS server process will not be using GPU.

I guess as part of the migration the process would need to
be non-CUR and release CPU (and GPU if VMS adds support for
CUDA or similar in the future).

Main memory will need to be migrated. And cluster will
not help with that.

But cluster with shared storage will help with disk files.

And cluster with shared SYSUAF will help with identity.

And cluster with shared queue database will help with jobs.

> Besides, clusters can contain heterogenous systems.

Yes.

The nodes would need to be compatible.

Mixed architecture cluster is definitely out
of the question.

:-)

Arne