Re: [RFC PATCH 0/8] KVM: Prepopulate guest memory API

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 3 Apr 2024 11:30:21 -0700

On Tue, Mar 19, 2024, Isaku Yamahata wrote:
> On Wed, Mar 06, 2024 at 06:09:54PM -0800,
> Isaku Yamahata <isaku.yamahata@xxxxxxxxxxxxxxx> wrote:
> 
> > On Wed, Mar 06, 2024 at 04:53:41PM -0800,
> > David Matlack <dmatlack@xxxxxxxxxx> wrote:
> > 
> > > On 2024-03-01 09:28 AM, isaku.yamahata@xxxxxxxxx wrote:
> > > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > > > 
> > > > Implementation:
> > > > - x86 KVM MMU
> > > >   In x86 KVM MMU, I chose to use kvm_mmu_do_page_fault().  It's not confined to
> > > >   KVM TDP MMU.  We can restrict it to KVM TDP MMU and introduce an optimized
> > > >   version.
> > > 
> > > Restricting to TDP MMU seems like a good idea. But I'm not quite sure
> > > how to reliably do that from a vCPU context. Checking for TDP being
> > > enabled is easy, but what if the vCPU is in guest-mode?
> > 
> > As you pointed out in other mail, legacy KVM MMU support or guest-mode will be
> > troublesome.

Why is shadow paging troublesome?  I don't see any obvious issues with effectively
prefetching into a shadow MMU with read fault semantics.  It might be pointless
and wasteful, as the guest PTEs need to be in place, but that's userspace's problem.

Testing is the biggest gap I see, as using the ioctl() for shadow paging will
essentially require a live guest, but that doesn't seem like it'd be too hard to
validate.  And unless we lock down the ioctl() to only be allowed on vCPUs that
have never done KVM_RUN, we need that test coverage anyways.

And I don't think it makes sense to try and lock down the ioctl(), because for
the enforcement to have any meaning, KVM would need to reject the ioctl() if *any*
vCPU has run, and adding that code would likely add more complexity than it solves.

> > The use case I supposed is pre-population before guest runs, the guest-mode
> > wouldn't matter. I didn't add explicit check for it, though.

KVM shouldn't have an explicit is_guest_mode() check, the support should be a
property of the underlying MMU, and KVM can use the TDP MMU for L2 (if L1 is
using legacy shadow paging, not TDP).

> > Any use case while vcpus running?
> > 
> > 
> > > Perhaps we can just return an error out to userspace if the vCPU is in
> > > guest-mode or TDP is disabled, and make it userspace's problem to do
> > > memory mapping before loading any vCPU state.
> > 
> > If the use case for default VM or sw-proteced VM is to avoid excessive kvm page
> > fault at guest boot, error on guest-mode or disabled TDP wouldn't matter.
> 
> Any input?  If no further input, I assume the primary use case is pre-population
> before guest running.

Pre-populating is the primary use case, but that could happen if L2 is active,
e.g. after live migration.

I'm not necessarily opposed to initially adding support only for the TDP MMU, but
if the delta to also support the shadow MMU is relatively small, my preference
would be to add the support right away.  E.g. to give us confidence that the uAPI
can work for multiple MMUs, and so that we don't have to write documentation for
x86 to explain exactly when it's legal to use the ioctl().