Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

Doug Covelli <doug.covelli@xxxxxxxxxxxx> · Sat, 9 Nov 2024 16:11:36 -0500

On Sat, Nov 9, 2024 at 1:20 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> On 11/8/24 06:03, Zack Rusin wrote:
> >>> There's no spec but we have open headers listing the hypercalls.
> >>> There's about a 100 of them (a few were deprecated), the full
> >>> list starts here:
> >>> https://github.com/vmware/open-vm-tools/blob/739c5a2f4bfd4cdda491e6a6f6869d88c0bd6972/open-vm-tools/lib/include/backdoor_def.h#L97
> >>> They're not well documented, but the names are pretty self-explenatory.
> >>
> >> At a quick glance, this one needs to be handled in KVM:
> >>
> >>    BDOOR_CMD_VCPU_MMIO_HONORS_PAT
> >>
> >> and these probably should be in KVM:
> >>
> >>    BDOOR_CMD_GETTIME
> >>    BDOOR_CMD_SIDT
> >>    BDOOR_CMD_SGDT
> >>    BDOOR_CMD_SLDT_STR
> >>    BDOOR_CMD_GETTIMEFULL
> >>    BDOOR_CMD_VCPU_LEGACY_X2APIC_OK
> >>    BDOOR_CMD_STEALCLOCK
> >
> > I'm not sure if there's any value in implementing a few of them.
>
> The value is that some of these depend on what the hypervisor does, not
> on what userspace does.  For Hypervisor.framework you have a lot of
> leeway, for KVM and Hyper-V less so.
>
> Please understand that adding support for a closed spec is already a bit
> of a tall ask.  We can meet in the middle and make up for the
> closedness, but the way to do it is not technical; it's essentially
> trust.  You are the guys that know the spec and the userspace code best,
> so we trust you to make choices that make technical sense for both KVM
> and VMware.  But without a spec we even have to trust you on what makes
> sense or not to have in the kernel, so we ask you to be... honest about
> that.
>
> One important point is that from the KVM maintainers' point of view, the
> feature you're adding might be used by others and not just VMware
> Workstation.  Microsoft and Apple might see things differently (Apple in
> particular has a much thinner wrapper around the processor's
> virtualization capbilities).
>
> > iirc
> > there's 101 of them (as I mentioned a lot have been deprecated but
> > that's for userspace, on the host we still have to do something for
> > old guests using them) and, if out of those 101 we implement 100 in
> > the kernel then, as far as this patch is concerned, it's no different
> > than if we had 0 out of 101 because we're still going to have to exit
> > to userspace to handle that 1 remaining.
> >
> > Unless you're saying that those would be useful to you. In which case
> > I'd be glad to implement them for you, but I'd put them behind some
> > kind of a cap or a kernel config because we wouldn't be using them -
>
> Actually we'd ask you to _not_ put them behind a cap, and live with the
> kernel implementation.  Obviously that's not a requirement for all the
> 100+ hypercalls, only for those where it makes sense.
>
> > besides what Doug mentioned - we already maintain the shared code for
> > them that's used on Windows, MacOS, ESX and Linux so even if we had
> > them in the Linux kernel it would still make more sense to use the
> > code that's shared with the other OSes to lessen the maintenance
> > burden (so that changing anything within that code consistently
> > changes across all the OSes).
>
> If some of them can have shared code across all OSes, then that's a good
> sign that they do not belong in the kernel.  On the other hand, if the
> code is specific to Windows/macOS/ESX/Linux, and maybe it even calls
> into low-level Hypervisor.framework APIs on macOS, then it's possible or
> even likely that the best implementation for Linux is "just assume that
> KVM will do it" and assert(0).
>
> In yet other cases (maybe those SGDT/SLDT/STR/SIDT ones??), if the code
> that you have for Linux is "just do this KVM ioctl to do it", it may
> provide better performance if you save the roundtrip to userspace and
> back.  If KVM is the best performing hypervisor for VMware Workstation,
> then we're happy, :) and if you have some performance issue we want to
> help you too.

Appreciate the concern about performance however I don't think it is
something we should worry about.  Even with our existing VMM, which
runs at CPL0, all of these backdoor calls are handled by userspace
which means they are very slow (~28K cycles overhead on my Zen2) and
are not used in any perf critical code (if they were we would have
handled them at CPL0). Running on KVM the overhead is significantly
less.

As for the SGDT/SLDT/STR/SIDT backdoor calls these were added > 20
years ago for SW that used these instructions from CPL3 which did not
work well before VT/SVM were introduced.  These are really of no use
on modern CPUs and will be blocked if the guest OS has enabled UMIP.
Adding support for these to the KVM code would be a bit of a waste
IMHO  I have no objection to adding support for handling some backdoor
calls in the kernel if we find ones where it would be advantageous to
do so I'm just not aware of any where this would be the caase..

> A related topic is that a good implementation, equivalent to what the
> proprietary hypervisor implemented, might require adding a ioctl to
> query something that KVM currently does not provide (maybe the current
> steal clock? IIRC it's only available via a Xen ioctl, not a generic
> one).  In that case you'd need to contribute that extra API.  Doing that
> now is easier for both you guys and the KVM maintainers, so that's
> another reason to go through the list and share your findings.

For stolen time the backdoor call is used to enable the functionality
not to get/set the stolen time.  I agree that we would probably want
to do something KVM specific for this one however this is currently
really only supported by ESX (and only currently used by Photon OS) so
I don't think adding that support to KVM is critical.

> Anyway, one question apart from this: is the API the same for the I/O
> port and hypercall backdoors?

Yeah the calls and arguments are the same.  The hypercall based
interface is an attempt to modernize the backdoor since as you pointed
out the I/O based interface is kind of hacky as it bypasses the normal
checks for an I/O port access at CPL3.  It would be nice to get rid of
it but unfortunately I don't think that will happen in the foreseeable
future as there are a lot of existing VMs out there with older SW that
still uses this interface.

> >> I don't think it addresses Paolo's concern (if I understood Paolo's concern
> >> correctly), but it would help from the perspective of allowing KVM to support
> >> VMware hypercalls and Xen/Hyper-V/KVM hypercalls in the same VM.
> >
> > Yea, I just don't think there's any realistic way we could handle all
> > of those hypercalls in the kernel so I'm trying to offer some ideas on
> > how to lessen the scope to make it as painless as possible. Unless you
> > think we could somehow parlay my piercing blue eyes into getting those
> > patches in as is, in which case let's do that ;)
>
> Unlikely :) but it's not in bad shape at all!  The main remaining
> discussion point is the subset of hypercalls that need support in the
> kernel (either as a kernel implementation, or as a new ioctl).
> Hopefully the above guidelines will help you.
>
> >> I also think we should add CONFIG_KVM_VMWARE from the get-go, and if we're feeling
> >> lucky, maybe even retroactively bury KVM_CAP_X86_VMWARE_BACKDOOR behind that
> >> Kconfig.  That would allow limiting the exposure to VMware specific code, e.g. if
> >> KVM does end up handling hypercalls in-kernel.  And it might deter abuse to some
> >> extent.
> >
> > I thought about that too. I was worried that even if we make it on by
> > default it will require quite a bit of handholding to make sure all
> > the distros include it, or otherwise on desktops Workstation still
> > wouldn't work with KVM by default, I also felt a little silly trying
> > to add a kernel config for those few lines that would be on pretty
> > much everywhere and since we didn't implement the vmware backdoor
> > functionality I didn't want to presume and try to shield a feature
> > that might be in production by others with a new kernel config.
> We don't have a huge number of such knobs but based on experience I
> expect that it will be turned off only by cloud providers or appliance
> manufacturers that want to reduce the attack surface.  If it's enabled
> by default, distros will generally leave it on.  You can also add "If
> unsure, say Y" to the help message as we already do in several cases.(*)
>
> In fact, if someone wants to turn it off, they will send the patch
> themselves to add CONFIG_KVM_VMWARE and it will be accepted.  So we
> might as well ask for it from the start. :)
>
> Thanks,
>
> Paolo
>
> (*) In fact I am wondering if we should flip the default for Xen, in the
> beginning it was just an Amazon thing but since then David has
> contributed support in QEMU and CI.  To be clear, I am *not* asking
> VMware for anything but selftests to make CONFIG_KVM_VMWARE default to
> enabled.
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.