Re: [RFC PATCH 0/8] KVM: Prepopulate guest memory API

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 3 Apr 2024 15:42:47 -0700

On Wed, Apr 03, 2024, Isaku Yamahata wrote:
> On Wed, Apr 03, 2024 at 11:30:21AM -0700,
> Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> 
> > On Tue, Mar 19, 2024, Isaku Yamahata wrote:
> > > On Wed, Mar 06, 2024 at 06:09:54PM -0800,
> > > Isaku Yamahata <isaku.yamahata@xxxxxxxxxxxxxxx> wrote:
> > > 
> > > > On Wed, Mar 06, 2024 at 04:53:41PM -0800,
> > > > David Matlack <dmatlack@xxxxxxxxxx> wrote:
> > > > 
> > > > > On 2024-03-01 09:28 AM, isaku.yamahata@xxxxxxxxx wrote:
> > > > > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > > > > > 
> > > > > > Implementation:
> > > > > > - x86 KVM MMU
> > > > > >   In x86 KVM MMU, I chose to use kvm_mmu_do_page_fault().  It's not confined to
> > > > > >   KVM TDP MMU.  We can restrict it to KVM TDP MMU and introduce an optimized
> > > > > >   version.
> > > > > 
> > > > > Restricting to TDP MMU seems like a good idea. But I'm not quite sure
> > > > > how to reliably do that from a vCPU context. Checking for TDP being
> > > > > enabled is easy, but what if the vCPU is in guest-mode?
> > > > 
> > > > As you pointed out in other mail, legacy KVM MMU support or guest-mode will be
> > > > troublesome.
> > 
> > Why is shadow paging troublesome?  I don't see any obvious issues with effectively
> > prefetching into a shadow MMU with read fault semantics.  It might be pointless
> > and wasteful, as the guest PTEs need to be in place, but that's userspace's problem.
> 
> The populating address for shadow paging is GVA, not GPA.  I'm not sure if
> that's what the user space wants.  If it's user-space problem, I'm fine.

/facepalm

> > Pre-populating is the primary use case, but that could happen if L2 is active,
> > e.g. after live migration.
> > 
> > I'm not necessarily opposed to initially adding support only for the TDP MMU, but
> > if the delta to also support the shadow MMU is relatively small, my preference
> > would be to add the support right away.  E.g. to give us confidence that the uAPI
> > can work for multiple MMUs, and so that we don't have to write documentation for
> > x86 to explain exactly when it's legal to use the ioctl().
> 
> If we call kvm_mmu.page_fault() without caring of what address will be
> populated, I don't see the big difference.  

Ignore me, I completely spaced that shadow MMUs don't operate on an L1 GPA.  I
100% agree that restricting this to TDP, at least for the initial merge, is the
way to go.  A uAPI where the type of address varies based on the vCPU mode and
MMU type would be super ugly, and probably hard to use.

At that point, I don't have a strong preference as to whether or not direct
legacy/shadow MMUs are supported.  That said, I think it can (probably should?)
be done in a way where it more or less Just Works, e.g. by having a function hook
in "struct kvm_mmu".