[Info-vax] clock problems with OpenVMS x86 on VirtualBox

Mon May 15 14:02:03 EDT 2023

In article <u3tr77$6e9$1 at news.misty.com>,
Johnny Billquist  <bqt at softjar.se> wrote:
>On 2023-05-15 15:35, Arne VajhÃ¸j wrote:
>[snip]
>>>> What host? A type 1 hypervisor does not run a host OS.
>>>
>>> Um? What is the hypervisor then?
>> 
>> The hypervisor software like ESXi.
>> 
>> It is not running on top of a host OS like a type 2 (VirtualBox etc.) does.
>
>I never assumed you had a full OS. However, they Hypervisor is 
>intercepting access to hardware, and is responsible for the actual 
>interaction with the hardware, adding a layer between the guest OS and 
>the hardware. Which means that the hypvervisor is in the end doing 
>interrupt handling, for example.

Yes and no.  For many things, the guest necessarily traps back
into the host environment for handling (indeed, on x86, the
`CPUID` instruction forces a mandatory VM exit).  However, there
are techniques whereby one can bypass the hypervisor entirely,
including for interrupt injection/delivery.  For example, again
on x86, the "posted interrupt" functionality exposed by both VMX
on Intel and SVM on AMD can be used in conjunction with SR-IOV
and the IOMMU on both platforms to allow virtual functions on
high-speed devices to deliver interrupts directly into a guest,
provided one is using PCI pass-through.  Similarly, IPI delivery
on x86 can use this same mechanism (though IPI generation, which
involves a write to ICR on the LAPIC, often requires hypervisor
intervention, even when using LAPIC virtualization: AMD has a
technique whereby this can _mostly_ be bypassed, but it is
buggy and not recommended for use).

And of course, the hypervisor must necessarily own the PCI
configuration space, and virtualize access to it, even if it
provides pass-thru to some functions.

Sadly, there's little support for virtual timers right now,
relating to another part of this discussion, so the local APIC
timer (for example) is usually multiplexed and requires a VM
exit to acknowledge the interrupt in the host and inject into
the guest.

>Which means that the CPU might be doing 
>things that suspends the execution for the hypervisor outside the 
>control of the hypervisor itself, and further more makes things like 
>interrupts to the guest OS happen at some random later time, including 
>clock interrupts, which are what affects things like real-time reponse 
>times.

The list of things that a type-1 hypervisor must own is large,
but in general I agree with you: it is de facto the kernel
running on the bare hardware, even if it does not provide many
of the services usually associated with a traditional operating
system (e.g., a filesystem or a user-visible process model).

>>>> And why would the CPU not be available, when there is no
>>>> over allocation of resources? It seems pretty silly of
>>>> the hypervisor to not want to give the CPU if no other
>>>> VM can get it.
>>>
>>> Any kind of hypervisor means we're talking about virtualization. That 
>>> means you have real hardware which exists outside of this 
>>> virtualization. And that hardware can generate interrupts and demand 
>>> service, which then forces the hypervisor to wait before it can 
>>> actually get any cycles. Outside the control of the hypervisor...
>> 
>> Maybe. But what would that be?
>
>Primarily I would be worried about interrupts, which introduce unknown 
>amout of delay.

I would generalize that to virtualized IO, not just interrupts.
Again, in a type-1 hypervisor, handling of IO requests is often
delegated to a root VM (Dom0 in Xen terms), which will introduce
its own latency issues.  "Disks" as seen by a VM may even be
synthesized by software and moved off-node.

>But yes, for things like memory, if it is fixed allocated to VM 
>instances, and always available, then another potential source of delays 
>are at least not there.

Again, see my earlier post: you still have issues with
management of the nested page tables, including cache and
TLB pressure, and a TLB miss is substantially more expensive:
conceptually, one must walk the second-level page table for
_every_ physical access.  To do a bog-standard 4-level page
walk, that can 16 memory accesses.

>Depending on what the hypervisor does, you might still also have issues 
>like page table caches, and normal memory caches, which will have rather 
>different behaviors compared to plain hardware. Caches are a significant 
>factor is performance these days.

Yes.  See my earlier post in this thread.

	- Dan C.