Re: [PATCH 58/61] KVM: x86/mmu: Configure max page level during hardware setup

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Wed, 26 Feb 2020 07:56:41 -0800

On Wed, Feb 26, 2020 at 03:55:55PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes:
> 
> > On Tue, Feb 25, 2020 at 03:43:36PM +0100, Vitaly Kuznetsov wrote:
> >> Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes:
> >> 
> >> > Configure the max page level during hardware setup to avoid a retpoline
> >> > in the page fault handler.  Drop ->get_lpage_level() as the page fault
> >> > handler was the last user.
> >> > @@ -6064,11 +6064,6 @@ static void svm_set_supported_cpuid(struct kvm_cpuid_entry2 *entry)
> >> >  	}
> >> >  }
> >> >  
> >> > -static int svm_get_lpage_level(void)
> >> > -{
> >> > -	return PT_PDPE_LEVEL;
> >> > -}
> >> 
> >> I've probably missed something but before the change, get_lpage_level()
> >> on AMD was always returning PT_PDPE_LEVEL, but after the change and when
> >> NPT is disabled, we set max_page_level to either PT_PDPE_LEVEL (when
> >> boot_cpu_has(X86_FEATURE_GBPAGES)) or PT_DIRECTORY_LEVEL
> >> (otherwise). This sounds like a change) unless we think that
> >> boot_cpu_has(X86_FEATURE_GBPAGES) is always true on AMD.
> >
> > It looks like a functional change, but isn't.  kvm_mmu_hugepage_adjust()
> > caps the page size used by KVM's MMU at the minimum of ->get_lpage_level()
> > and the host's mapping level.  Barring an egregious bug in the kernel MMU,
> > the host page tables will max out at PT_DIRECTORY_LEVEL (2mb) unless
> > boot_cpu_has(X86_FEATURE_GBPAGES) is true.
> >
> > In other words, this is effectively a "documentation" change.  I'll figure
> > out a way to explain this in the changelog...
> >
> >         max_level = min(max_level, kvm_x86_ops->get_lpage_level());
> >         for ( ; max_level > PT_PAGE_TABLE_LEVEL; max_level--) {
> >                 linfo = lpage_info_slot(gfn, slot, max_level);
> >                 if (!linfo->disallow_lpage)
> >                         break;
> >         }
> >
> >         if (max_level == PT_PAGE_TABLE_LEVEL)
> >                 return PT_PAGE_TABLE_LEVEL;
> >
> >         level = host_pfn_mapping_level(vcpu, gfn, pfn, slot);
> >         if (level == PT_PAGE_TABLE_LEVEL)
> >                 return level;
> >
> >         level = min(level, max_level); <---------
> 
> Ok, I see (I believe):
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> 
> It would've helped me a bit if kvm_configure_mmu() was written the
> following way:
> 
> void kvm_configure_mmu(bool enable_tdp, int tdp_page_level)
> {
>         tdp_enabled = enable_tdp;
> 
> 	if (boot_cpu_has(X86_FEATURE_GBPAGES))
>                 max_page_level = PT_PDPE_LEVEL;
>         else
>                 max_page_level = PT_DIRECTORY_LEVEL;
> 
>         if (tdp_enabled)
> 		max_page_level = min(tdp_page_level, max_page_level);
> }
> 
> (we can't have cpu_has_vmx_ept_1g_page() and not
> boot_cpu_has(X86_FEATURE_GBPAGES), right?)

Wrong, because VMX.  It could even occur on a real system if the user
disables the feature via kernel param, e.g. "clearcpuid=58".  In the end it
won't actually change anything because KVM caps its page size at the kernel
page size (as above).  Well, unless someone is running a custom kernel that
does funky things.

> But this is certainly just a personal preference, feel free to ignore)

I'm on the fence.  Part of me likes having max_page_level reflect what KVM
is capable of, irrespective of the kernel.