Re: [PATCH v2 07/10] KVM: x86: Always populate L1 GPA for KVM_MAP_MEMORY

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 16 Apr 2024 07:22:21 -0700

On Mon, Apr 15, 2024, Isaku Yamahata wrote:
> On Mon, Apr 15, 2024 at 02:17:02PM -0700,
> Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> 
> > > > - Return error on guest mode or SMM mode:  Without this patch.
> > > >   Pros: No additional patch.
> > > >   Cons: Difficult to use.
> > > 
> > > Hmm... For the non-TDX use cases this is just an optimization, right? For TDX
> > > there shouldn't be an issue. If so, maybe this last one is not so horrible.
> > 
> > And the fact there are so variables to control (MAXPHADDR, SMM, and guest_mode)
> > basically invalidates the argument that returning an error makes the ioctl() hard
> > to use.  I can imagine it might be hard to squeeze this ioctl() into QEMU's
> > existing code, but I don't buy that the ioctl() itself is hard to use.
> > 
> > Literally the only thing userspace needs to do is set CPUID to implicitly select
> > between 4-level and 5-level paging.  If userspace wants to pre-map memory during
> > live migration, or when jump-starting the guest with pre-defined state, simply
> > pre-map memory before stuffing guest state.  In and of itself, that doesn't seem
> > difficult, e.g. at a quick glance, QEMU could add a hook somewhere in
> > kvm_vcpu_thread_fn() without too much trouble (though that comes with a huge
> > disclaimer that I only know enough about how QEMU manages vCPUs to be dangerous).
> > 
> > I would describe the overall cons for this patch versus returning an error
> > differently.  Switching MMU state puts the complexity in the kernel.  Returning
> > an error punts any complexity to userspace.  Specifically, anything that KVM can
> > do regarding vCPU state to get the right MMU, userspace can do too.
> >  
> > Add on that silently doing things that effectively ignore guest state usually
> > ends badly, and I don't see a good argument for this patch (or any variant
> > thereof).
> 
> Ok, here is a experimental patch on top of the 7/10 to return error.  Is this
> a direction? or do we want to invoke KVM page fault handler without any check?
> 
> I can see the following options.
> 
> - Error if vCPU is in SMM mode or guest mode: This patch
>   Defer the decision until the use cases come up.  We can utilize
>   KVM_CAP_MAP_MEMORY and struct kvm_map_memory.flags for future
>   enhancement.
>   Pro: Keep room for future enhancement for unclear use cases to defer
>        the decision.
>   Con: The use space VMM has to check/switch the vCPU mode.
> 
> - No check of vCPU mode and go on
>   Pro: It works.
>   Con: Unclear how the uAPI should be without concrete use cases.
> 
> - Always populate with L1 GPA:
>   This is a bad idea.

Not always.  If L1 is using shadow paging, then L1 and L2 GPAs are in the same
domain from KVM's perspective.

As I said in v1 (I think it was v1), whether or not mapping an L1 GPA is supported
should be a property of the mmu, not an explicit check.  As far as the TDP MMU is
concerend, there's nothing special about SMM nor is there anything special about
guest_mode, except for the fact that they use different roots than "normal" mode.
But that part Just Works.

And if L1 is using TDP, i.e. KVM is shadowing L1's TDP page tables, and L2 is
active, then vcpu->arch.mmu will point at a variation of the shadow MMU, e.g.
the PTTYPE_EPT MMU on Intel CPUs.  The shadow MMU won't support pre-mapping GPAs
because it's non-sensical (legacy shadow paging needs a GVA, nested TDP needs an
L2 GPA), and so the ioctl() fails because mmu->map_gpa or whatever is NULL.

In other words, unless I'm forgetting something, explicit checks for "unsupported"
modes shoud be unnecessary, because