[Info-vax] VMware

Wed Dec 11 09:02:18 EST 2019

On 11/12/2019 13:32, Bob Gezelter wrote:
> On Tuesday, December 10, 2019 at 10:00:27 PM UTC-5, Grant Taylor wrote:
>> On 12/10/19 5:27 PM, Bob Gezelter wrote:
>>> With all due respect, I want to see the fine-grain details on
>>> that implementation.
>>
>> Please see my reply from ~ 7:35.  (Adjust hour accordingly for your time
>> zone.)
>>
>> I think that's about as granular as I can get without going and looking
>> things up.
>>
>> I have no problem with you wanting to see the fine-grain details.  I
>> asked very similar questions 10+ years ago.  Hence why I have the
>> understanding that I do.  Also why it's now only a high level detail.
>>
>>> Particularly the part about "packets do not drop".
>>
>> I've routinely moved VMs between hosts without dropping packets.  I do
>> see latency at the epoch of the transition increase momentarily (usually
>> just one packet).  But the packet does make it through and is not dropped.
>>
>> Frequently latency is something like this:
>>
>> 1–3 ms
>> 1–3 ms
>> 1–3 ms
>> 9–12 ms
>> 1–3 ms
>> 1–3 ms
>> 1–3 ms
>>
>> No packet drop.
>>
>> TCP sessions continue without retransmissions.
>>
>>> Ensuring granularity of file update is also quite a challenge.
>>
>> Why?  (Please see my other message about what happens.)
>>
>>> There is a large difference between "rarely are packets lost" and
>>> "packets are never lost". Pre-loading other virtual instances and
>>> keeping memory state updated them updated is one thing, ensuring mass
>>> storage state is something else.
>>
>> All hosts in the cluster have access to the same storage.  So anything
>> written on one host is readable by other hosts.  Part of the migration
>> ensures that cached data is synced to disk and / or copied as part of
>> the memory for the system.
>>
>> So there's no "mass storage state" to keep in sync because it is the
>> same back end storage.
>>
>>> I will not even get into questions like the state of attached
>>> non-storage peripherals, e.g. RNGs.
>>
>> Those would be the types of things that would prevent migration between
>> hosts.
>>
>> Though, I think that VMware has an option to allow USB peripherals to be
>> used across the network.
>>
>> If not VMware, there are other OS level solutions to allow some
>> peripherals to be used across the network.
>>
>> I've personally used remote (TCP based) serial ports for fax servers.
>> The modem is physically connected to a network attached DigiBoard (or
>> the likes) and the VM is free to move from host to host to host because
>> it's TCP connection to the serial port is still in tact.
>>
>> Given that faxing is time sensitive serial audio / data (depending on
>> the modem) there may be an issue with the momentary increased latency.
>> I don't know if that would ride through a migration or if it would rely
>> on error detection and correction in the modem / fax level.
>>
>>> My general advice is to deeply verify the precise nature of the
>>> implementation and its limitations before relying on it.
>>
>> I think that's a wonderful idea.
>>
>>> A while back, I was at an user group event where there was a
>>> presentation on VM migration. The speaker made a statement that
>>> failover migration would handle all cases. Being from New York City,
>>> I inquired about a scenario we had experienced a few years earlier.
>>
>> ~chuckle~
>>
>> Absolutes are usually a problem in one way or another.  ;-)
>>
>>> A Boeing 767 doing between 150 and 200 knots comes through your machine
>>> room window. How long does it take to traverse the 24 inches between
>>> front of the cabinet and the back of the cabinet. Even that scenario
>>> does not include the fact that the infrastructure connecting one
>>> VM host to another has likely been severed before the VM host frame
>>> is hit.
>>
>> I think that's a valid question.  I think it's an EXTREMELY ATYPICAL
>> failure scenario.  But it is decidedly within the "all cases" absolute
>> the speaker set themselves up for.
>>
>> I think that would be very difficult to protect against.
>>
>> I would question, what about a data center in an adjacent building that
>> you can extend the LAN / SAN / etc. into.  Though it could also
>> experience a similar problem (fate sharing).
>>
>> When you start talking about failures that can take out multiple
>> buildings in close proximity to each other, you REALLY need an EXTREMELY
>> robust solution.
>>
>> I do think that VMware has some solutions that can work over extended
>> distances.
>>
>>
>>
>> -- 
>> Grant. . . .
>> unix || die
> 
> Grant,
> 
> Your post proves my point.
> 
> I do not disagree that within the context of "controlled" VM migration between hosts, it is possible to accomplish the migration without loss of packets or I/O inconsistency.
> 
> It is the uncontrolled case to which I referred.
> 
> Of course, in the controlled case, the connection to the switch can be blocked/queued AND acknowledged to prevent packet(s) from being caught during the transition. Alternatively, the MAC address can be changed and the packets queued at the new host. A similar argument applies to I/O. In a controlled case, active I/O cam be completed before the transfer.
> 
> Otherwise, one needs facilities not present in x86 (e.g., lock-step execution as was implemented on some fault tolerant architectures in the past). As an example, modern hardware RNGs make precise execution profiles on modern systems unlikely.
> 
> - Bob Gezelter, http://www.rlgsc.com
> 

Bob,

Other than the hypervisor, nothing under VMWARE runs as a real CPU 
process.It all runs using the virtual assists. The OS drivers are not 
talking to real hardware, they are talking via emulated hardware.

Typically the OS sees a SCSI or ATA adaptor but the real hardware will 
be fibre SAN.

So whilst X86 does not have lock step VMWARE does have lock step...

https://searchvmware.techtarget.com/definition/VMware-vLockstep

Dave