Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Sat, 9 Nov 2024 19:20:46 +0100

On 11/8/24 06:03, Zack Rusin wrote:
There's no spec but we have open headers listing the hypercalls.
There's about a 100 of them (a few were deprecated), the full
list starts here:
https://github.com/vmware/open-vm-tools/blob/739c5a2f4bfd4cdda491e6a6f6869d88c0bd6972/open-vm-tools/lib/include/backdoor_def.h#L97
They're not well documented, but the names are pretty self-explenatory.

At a quick glance, this one needs to be handled in KVM:

   BDOOR_CMD_VCPU_MMIO_HONORS_PAT

and these probably should be in KVM:

   BDOOR_CMD_GETTIME
   BDOOR_CMD_SIDT
   BDOOR_CMD_SGDT
   BDOOR_CMD_SLDT_STR
   BDOOR_CMD_GETTIMEFULL
   BDOOR_CMD_VCPU_LEGACY_X2APIC_OK
   BDOOR_CMD_STEALCLOCK

I'm not sure if there's any value in implementing a few of them.

The value is that some of these depend on what the hypervisor does, not 
on what userspace does.  For Hypervisor.framework you have a lot of 
leeway, for KVM and Hyper-V less so.

Please understand that adding support for a closed spec is already a bit 
of a tall ask.  We can meet in the middle and make up for the 
closedness, but the way to do it is not technical; it's essentially 
trust.  You are the guys that know the spec and the userspace code best, 
so we trust you to make choices that make technical sense for both KVM 
and VMware.  But without a spec we even have to trust you on what makes 
sense or not to have in the kernel, so we ask you to be... honest about 
that.

One important point is that from the KVM maintainers' point of view, the 
feature you're adding might be used by others and not just VMware 
Workstation.  Microsoft and Apple might see things differently (Apple in 
particular has a much thinner wrapper around the processor's 
virtualization capbilities).

iirc
there's 101 of them (as I mentioned a lot have been deprecated but
that's for userspace, on the host we still have to do something for
old guests using them) and, if out of those 101 we implement 100 in
the kernel then, as far as this patch is concerned, it's no different
than if we had 0 out of 101 because we're still going to have to exit
to userspace to handle that 1 remaining.

Unless you're saying that those would be useful to you. In which case
I'd be glad to implement them for you, but I'd put them behind some
kind of a cap or a kernel config because we wouldn't be using them -

Actually we'd ask you to _not_ put them behind a cap, and live with the 
kernel implementation.  Obviously that's not a requirement for all the 
100+ hypercalls, only for those where it makes sense.

besides what Doug mentioned - we already maintain the shared code for
them that's used on Windows, MacOS, ESX and Linux so even if we had
them in the Linux kernel it would still make more sense to use the
code that's shared with the other OSes to lessen the maintenance
burden (so that changing anything within that code consistently
changes across all the OSes).

If some of them can have shared code across all OSes, then that's a good 
sign that they do not belong in the kernel.  On the other hand, if the 
code is specific to Windows/macOS/ESX/Linux, and maybe it even calls 
into low-level Hypervisor.framework APIs on macOS, then it's possible or 
even likely that the best implementation for Linux is "just assume that 
KVM will do it" and assert(0).

In yet other cases (maybe those SGDT/SLDT/STR/SIDT ones??), if the code 
that you have for Linux is "just do this KVM ioctl to do it", it may 
provide better performance if you save the roundtrip to userspace and 
back.  If KVM is the best performing hypervisor for VMware Workstation, 
then we're happy, :) and if you have some performance issue we want to 
help you too.

A related topic is that a good implementation, equivalent to what the 
proprietary hypervisor implemented, might require adding a ioctl to 
query something that KVM currently does not provide (maybe the current 
steal clock? IIRC it's only available via a Xen ioctl, not a generic 
one).  In that case you'd need to contribute that extra API.  Doing that 
now is easier for both you guys and the KVM maintainers, so that's 
another reason to go through the list and share your findings.

Anyway, one question apart from this: is the API the same for the I/O 
port and hypercall backdoors?

I don't think it addresses Paolo's concern (if I understood Paolo's concern
correctly), but it would help from the perspective of allowing KVM to support
VMware hypercalls and Xen/Hyper-V/KVM hypercalls in the same VM.

Yea, I just don't think there's any realistic way we could handle all
of those hypercalls in the kernel so I'm trying to offer some ideas on
how to lessen the scope to make it as painless as possible. Unless you
think we could somehow parlay my piercing blue eyes into getting those
patches in as is, in which case let's do that ;)

Unlikely :) but it's not in bad shape at all!  The main remaining 
discussion point is the subset of hypercalls that need support in the 
kernel (either as a kernel implementation, or as a new ioctl). 
Hopefully the above guidelines will help you.

I also think we should add CONFIG_KVM_VMWARE from the get-go, and if we're feeling
lucky, maybe even retroactively bury KVM_CAP_X86_VMWARE_BACKDOOR behind that
Kconfig.  That would allow limiting the exposure to VMware specific code, e.g. if
KVM does end up handling hypercalls in-kernel.  And it might deter abuse to some
extent.

I thought about that too. I was worried that even if we make it on by
default it will require quite a bit of handholding to make sure all
the distros include it, or otherwise on desktops Workstation still
wouldn't work with KVM by default, I also felt a little silly trying
to add a kernel config for those few lines that would be on pretty
much everywhere and since we didn't implement the vmware backdoor
functionality I didn't want to presume and try to shield a feature
that might be in production by others with a new kernel config.
We don't have a huge number of such knobs but based on experience I 
expect that it will be turned off only by cloud providers or appliance 
manufacturers that want to reduce the attack surface.  If it's enabled 
by default, distros will generally leave it on.  You can also add "If 
unsure, say Y" to the help message as we already do in several cases.(*)

In fact, if someone wants to turn it off, they will send the patch 
themselves to add CONFIG_KVM_VMWARE and it will be accepted.  So we 
might as well ask for it from the start. :)

Thanks,

Paolo

(*) In fact I am wondering if we should flip the default for Xen, in the 
beginning it was just an Amazon thing but since then David has 
contributed support in QEMU and CI.  To be clear, I am *not* asking 
VMware for anything but selftests to make CONFIG_KVM_VMWARE default to 
enabled.