[Info-vax] Ksplice equivalent for VMS ?

Dan Cross cross at spitfire.i.gajendra.net
Wed Feb 19 21:26:41 EST 2025


In article <67b66f14$0$712$14726298 at news.sunsite.dk>,
Arne Vajhøj  <arne at vajhoej.dk> wrote:
>On 2/19/2025 5:26 PM, Dan Cross wrote:
>> In article <vp5dig$2dnk8$1 at dont-email.me>,
>> Arne Vajhøj  <arne at vajhoej.dk> wrote:
>>> I am thinking about a scenario like:
>>> * cluster with node A and B
>>> * critical process P that for whatever reason does not work
>>>    running concurrent on multiple nodes runs on A
>>> * node A needs to be taken down for some reason
>>> * so VMS on node A and B does some magic and migrate P from A to B
>>>    transparent to users (obviously require a cluster IP address or
>>>    load balancer)
>> 
>> While this may be an acceptable method to "hotpatch" a host with
>> minimal disruption to whatever workload it's running, it is
>> completely unlike what ksplice does.  For one, it requires that
>> sufficient resources exist in wherever you'd migrate the process
>> to for the duration of the update.
>
>That is a requirement.
>
>:-)
>
>>                                    Moreover, it requires that
>> all aspects of state that are required to resume execution of
>> the process are accessable and replicable on other, similar
>> hardware.
>
>Yes. Which becomes a little easier when restricted to a
>cluster instead of any systems.

I don't know what you mean when you say, "restricted to a
cluster instead of any systems."  If you mean that this somehow
makes managing state during process migration easier, then no,
not really; all of the same caveats apply.  For instance,
if a program is using (say) a GPU for computation, part of
migrating it will be extracting whatever state it has in the
GPU out of the GPU, and replicating it on the destination
system.

At one point, the internal numbering of cores in the GPU was
visible to code running on the GPU, creating an $n \choose k$
fingerprinting problem for migration.

Besides, clusters can contain heterogenous systems.

>> Many hyperscalar cloud providers do something similar for
>> updates, but there are serious limitations and downsides; for
>> example, direct passthru to hardware devices (storage, compute
>> accelerators, etc) can make it impossible to move a VM.
>
>Moving VM's is common. Robert started by mentioning vMotion.

I don't see how that's relevant to the points I raised.

	- Dan C.



More information about the Info-vax mailing list