[Info-vax] VMWARE/ESXi Linux

Tue Dec 3 09:39:45 EST 2024

In article <vilsop$2qc5u$1 at paganini.bofh.team>,
Waldek Hebisch <antispam at fricas.org> wrote:
>Matthew R. Wilson <mwilson at mattwilson.org> wrote:
>> On 2024-11-28, Lawrence D'Oliveiro <ldo at nz.invalid> wrote:
>>> On Wed, 27 Nov 2024 22:24 +0000 (GMT Standard Time), John Dallman wrote:
>>>
>>>> In article <vi84pm$6ct6$4 at dont-email.me>, ldo at nz.invalid (Lawrence
>>>> D'Oliveiro) wrote:
>>>>>
>>>>> On Wed, 27 Nov 2024 16:33:56 -0500, David Turner wrote:
>>>>>
>>>>>> I keep being told that VMWARE is not an OS in itself.
>>>>>> But it is... based on Ubuntu Kernel....  stripped down but still
>>>>>> Linux
>>>>> 
>>>>> And not even using the native KVM virtualization architecture that is
>>>>> built into Linux.
>>>> 
>>>> History: VMware ESXi was released in 2001 and KVM was merged into the
>>>> Linux kernel in 2007.
>>>
>>> In other words, VMware has long been obsoleted by better solutions.
>> 
>> Please explain how ESXi is obsolete, and how KVM is a better solution.
>> 
>> Both KVM and ESXi use the processor's VT-d (or AMD's equivalent, AMD-Vi)
>> extensions on x86 to efficiently handle instructions that require
>> hypervisor intervention. I'm not sure how you'd judge which one is a
>> better solution in that regard. So the only thing that matters, really,
>> is the virtualization of everything other than the processor itself.
>
>Little nitpick: virtualization need to handle _some_ system instructions.
>But with VT-d and particularly with nested page tables this should
>be easy.

Sadly, not really.  Virtualization needs to handle many
instructions, of multiple types, and be able to do so gracefully
and performantly.  This includes, of course, the underlying
hardware's supervisor instruction set and any privileged
operations, but also those instructions that can leak data about
the underlying hardware that the hypervisor would rather be
hidden.  Hence, `CPUID` forces an unconditional VM exit on x86.

Moreover, there is the issue of unimplemented userspace
instructions.  Most virtualization systems provide a base
"platform" that the guest may rely on, which will include some
userspace instructions that may, or may not, be available on
the underlying hardware.  If a guest executes an instruction
that is not implemented on the underlying hardware, even a
non-privileged instruction, then the hypervisor must catch the
resulting trap and emulate that instruction, and all of its
side-effects.  And in modern systems, this problem is
exacerbated by VMs that can be migratated between different
host systems over time.  This, and suspension/resumption,
also leads to all sorts of interesting edge cases that must be
handled; how does one deal with TSC skew between systems, for
example?  What does a guest do when no time has elapsed from
_its_ perspective, but it suddenly finds that real time has
advanced by seconds, minutes, hours, or days?

And with x86, even emulating simple instructions, like
programmed IO, can be challenging.  This is in part because
VT-x does not bank the instruction bytes on the VMCS/VMCB on
exit, so the hypervisor must look at the RIP from the exit, and
then go and fetch the instruction bytes from the guest itself.
But to do that the hypervisor must example the state of the VCPU
closely and emulate what the CPU would do in the fetching
process exactly; for example, if the CPU is using paging, the
hypervisor must be careful to set the A bit on the PTEs for
where it thinks the instruction is coming from; if that
instruction spans a page boundary similarly, etc.  And even then
it cannot guarantee that it will do a perfect job: the VCPU may
have been fetching from a page for which the TLB entry was stale
and thus the instruction bytes the hypervisor reads following
the guest's page tables may not be the actual bytes that the
guest was reading.

And this doesn't even begin to account for nested
virtualization, which is easily an order of magnitude more work
than simple virtualization.

Also, see below.

>> KVM is largely dependent on qemu to provide the rest of the actual
>> virtual system. qemu's a great project and I run a ton of desktop VMs
>> with qemu+KVM, but it just doesn't have the level of maturity or
>> edge-case support that ESXi does. Pretty much any x86 operating system,
>> historical or current, _just works_ in ESXi.  With qemu+KVM, you're
>> going to have good success with the "big name" OSes...Windows, Linux,
>> the major BSDs, etc., but you're going to be fighting with quirks and
>> problems if you're trying, say, old OS/2 releases. That's not relevant
>> for most people looking for virtualization solutions, and the problems
>> aren't always insurmountable, but you're claiming that KVM is a "better"
>> solution, whereas in my experience, in reality, ESXi is the better
>> technology.
>
>What you wrote is now very atypical use: faithfully implementing
>all quirks of real devices.  More typical case is guest which
>knows that it is running on a hypervisor and uses virtual
>interface with no real counterpart.  For this quality of
>virtual interfaces matters.  I do not know how ESXi compares
>to KVM, but I know that "equivalent" but different virtual
>interfaces in qemu+KVM may have markedly different performance.

While enlightenments are a thing, and paravirtualization can
dramatically increase performance, handling unmodified guests is
still a very important use case for pretty much every serious
virtualization system.  And that does mean handling all the
quirks of not just the CPU, but also the device models that the
hypervisor presents to the guest.  That's a big job.

>> (As an aside, VMWare's _desktop_ [not server] virtualization product,
>> VMWare Workstation, looks like it's making moves to use KVM under the
>> hood, but they have said they will continue using their own proprietary
>> virtual devices and drivers, which is really what sets VMWare apart from
>> qemu. This is a move they've already made on both the Windows and Mac OS
>> version of VMWare Workstation if I understand correctly [utilizing
>> Hyper-V and Apple's Virtualization framework]. This makes sense... as I
>> said, the underlying virtualization of the processor is being handled by
>> the VT-x capabilities of the processor whether you're using VMWare,
>> VirtualBox, KVM, etc., so when running a desktop product under Linux,
>> you may as well use KVM but you still need other software to build the
>> rest of the virtual system and its virtual devices, so that's where
>> VMWare and qemu will still differentiate themselves. None of this is
>> relevant for ESXi, though, because as has been pointed out earlier in
>> the thread, it is not running on Linux at all, so VMKernel is providing
>> its own implementation of, essentially, what KVM provides in the Linux
>> kernel.)
>
>From what you wrote seem that ESXi is more similar to Xen than to
>KVM+qemu, that is ESXi and Xen discourage running unvirtualized programs
>while in KVM+qemu some (frequently most) programs is running unvirtualized
>and only rest is virtualized.  I do not know if this sets limits on quality
>of virtualization, but that could be valid reason for ESXi to provide its
>own kernel.

That's correct; ESXi and Xen are architecturally similar.  KVM
and VMWare Player are more similar.

	- Dan C.