Re: [PATCH 11/11] KVM: x86: Add gmem hook for determining max NPT mapping level

Isaku Yamahata <isaku.yamahata@xxxxxxxxx> · Fri, 19 Apr 2024 11:26:15 -0700

On Tue, Apr 09, 2024 at 06:46:32PM -0500,
Michael Roth <michael.roth@xxxxxxx> wrote:

> On Thu, Apr 04, 2024 at 02:50:33PM -0400, Paolo Bonzini wrote:
> > From: Michael Roth <michael.roth@xxxxxxx>
> > 
> > In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
> > 2MB mapping in the guest's nested page table depends on whether or not
> > any subpages within the range have already been initialized as private
> > in the RMP table. The existing mixed-attribute tracking in KVM is
> > insufficient here, for instance:
> > 
> >   - gmem allocates 2MB page
> >   - guest issues PVALIDATE on 2MB page
> >   - guest later converts a subpage to shared
> >   - SNP host code issues PSMASH to split 2MB RMP mapping to 4K
> >   - KVM MMU splits NPT mapping to 4K
> 
> Binbin caught that I'd neglected to document the last step in the
> theoretical sequence here. It should state something to the effect
> of:
> 
>   - guest later converts that shared page back to private
> 
> -Mike
> 
> > 
> > At this point there are no mixed attributes, and KVM would normally
> > allow for 2MB NPT mappings again, but this is actually not allowed
> > because the RMP table mappings are 4K and cannot be promoted on the
> > hypervisor side, so the NPT mappings must still be limited to 4K to
> > match this.
> > 
> > Add a hook to determine the max NPT mapping size in situations like
> > this.
> > 
> > Signed-off-by: Michael Roth <michael.roth@xxxxxxx>
> > Message-Id: <20231230172351.574091-31-michael.roth@xxxxxxx>
> > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > ---
> >  arch/x86/include/asm/kvm-x86-ops.h | 1 +
> >  arch/x86/include/asm/kvm_host.h    | 2 ++
> >  arch/x86/kvm/mmu/mmu.c             | 8 ++++++++
> >  3 files changed, 11 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > index c81990937ab4..2db87a6fd52a 100644
> > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > @@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> >  KVM_X86_OP_OPTIONAL(get_untagged_addr)
> >  KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
> >  KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
> > +KVM_X86_OP_OPTIONAL_RET0(gmem_validate_fault)
> >  KVM_X86_OP_OPTIONAL(gmem_invalidate)
> >  
> >  #undef KVM_X86_OP
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 59c7b95034fc..67dc108dd366 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1811,6 +1811,8 @@ struct kvm_x86_ops {
> >  	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
> >  	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
> >  	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
> > +	int (*gmem_validate_fault)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
> > +				   u8 *max_level);
> >  };

I think you added is_private due to the TDX patches.  As Yan pointed out at
https://lore.kernel.org/kvm/ZiHGoUUcGlZObQvx@xxxxxxxxxxxxxxxxxxxxxxxxx/

It's guaranteed that is_private is always true because the caller check it.

  if (fault->is_private)
    kvm_faultin_pfn_private()

So we can drop is_private parameter.
-- 
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>