Re: [RFC PATCH 0/8] KVM: Prepopulate guest memory API

Isaku Yamahata <isaku.yamahata@xxxxxxxxx> · Wed, 3 Apr 2024 15:00:23 -0700

On Wed, Apr 03, 2024 at 11:30:21AM -0700,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

> On Tue, Mar 19, 2024, Isaku Yamahata wrote:
> > On Wed, Mar 06, 2024 at 06:09:54PM -0800,
> > Isaku Yamahata <isaku.yamahata@xxxxxxxxxxxxxxx> wrote:
> > 
> > > On Wed, Mar 06, 2024 at 04:53:41PM -0800,
> > > David Matlack <dmatlack@xxxxxxxxxx> wrote:
> > > 
> > > > On 2024-03-01 09:28 AM, isaku.yamahata@xxxxxxxxx wrote:
> > > > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > > > > 
> > > > > Implementation:
> > > > > - x86 KVM MMU
> > > > >   In x86 KVM MMU, I chose to use kvm_mmu_do_page_fault().  It's not confined to
> > > > >   KVM TDP MMU.  We can restrict it to KVM TDP MMU and introduce an optimized
> > > > >   version.
> > > > 
> > > > Restricting to TDP MMU seems like a good idea. But I'm not quite sure
> > > > how to reliably do that from a vCPU context. Checking for TDP being
> > > > enabled is easy, but what if the vCPU is in guest-mode?
> > > 
> > > As you pointed out in other mail, legacy KVM MMU support or guest-mode will be
> > > troublesome.
> 
> Why is shadow paging troublesome?  I don't see any obvious issues with effectively
> prefetching into a shadow MMU with read fault semantics.  It might be pointless
> and wasteful, as the guest PTEs need to be in place, but that's userspace's problem.

The populating address for shadow paging is GVA, not GPA.  I'm not sure if
that's what the user space wants.  If it's user-space problem, I'm fine.

> Testing is the biggest gap I see, as using the ioctl() for shadow paging will
> essentially require a live guest, but that doesn't seem like it'd be too hard to
> validate.  And unless we lock down the ioctl() to only be allowed on vCPUs that
> have never done KVM_RUN, we need that test coverage anyways.

So far I tried only TDP MMU case.  I can try other MMU type.

> And I don't think it makes sense to try and lock down the ioctl(), because for
> the enforcement to have any meaning, KVM would need to reject the ioctl() if *any*
> vCPU has run, and adding that code would likely add more complexity than it solves.
> 
> > > The use case I supposed is pre-population before guest runs, the guest-mode
> > > wouldn't matter. I didn't add explicit check for it, though.
> 
> KVM shouldn't have an explicit is_guest_mode() check, the support should be a
> property of the underlying MMU, and KVM can use the TDP MMU for L2 (if L1 is
> using legacy shadow paging, not TDP).

I see.  So the type of the populating address can vary depending on vcpu mode.
It's user-space problem which address (GVA, L1 GPA, L2 GPA) is used.

> > > Any use case while vcpus running?
> > > 
> > > 
> > > > Perhaps we can just return an error out to userspace if the vCPU is in
> > > > guest-mode or TDP is disabled, and make it userspace's problem to do
> > > > memory mapping before loading any vCPU state.
> > > 
> > > If the use case for default VM or sw-proteced VM is to avoid excessive kvm page
> > > fault at guest boot, error on guest-mode or disabled TDP wouldn't matter.
> > 
> > Any input?  If no further input, I assume the primary use case is pre-population
> > before guest running.
> 
> Pre-populating is the primary use case, but that could happen if L2 is active,
> e.g. after live migration.
> 
> I'm not necessarily opposed to initially adding support only for the TDP MMU, but
> if the delta to also support the shadow MMU is relatively small, my preference
> would be to add the support right away.  E.g. to give us confidence that the uAPI
> can work for multiple MMUs, and so that we don't have to write documentation for
> x86 to explain exactly when it's legal to use the ioctl().

If we call kvm_mmu.page_fault() without caring of what address will be
populated, I don't see the big difference.  
-- 
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>