Re: [PATCH v4 1/5] KVM: x86: Move memcache allocation to GFP_PGTABLE_USER

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Wed, 27 Nov 2019 10:07:31 -0800

On Tue, Nov 05, 2019 at 12:03:53PM +0100, Christoffer Dall wrote:
> Recent commit 50f11a8a4620eee6b6831e69ab5d42456546d7d8 moved page table
> allocations for both KVM and normal user page table allocations to
> GFP_PGTABLE_USER in order to get __GFP_ACCOUNT for the page tables.
> 
> However, while KVM on other architectures such as arm64 were included in
> this change, curiously KVM on x86 was not.
> 
> Currently, KVM on x86 uses kmem_cache_zalloc(GFP_KERNEL_ACCOUNT) for
> kmem_cache-based allocations, which expands in the following way:
>   kmem_cache_zalloc(..., GFP_KERNEL_ACCOUNT) =>
>   kmem_cache_alloc(..., GFP_KERNEL_ACCOUNT | __GFP_ZERO) =>
>   kmem_cache_alloc(..., GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO)
> 
> It so happens that GFP_PGTABLE_USER expands as:
>   GFP_PGTABLE_USER =>
>   (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT) =>
>   ((GFP_KERNEL | __GFP_ZERO) | __GFP_ACCOUNT) =>
>   (GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO)
> 
> Which means that we can replace the current KVM on x86 call as:
> -  obj = kmem_cache_zalloc(base_cache, GFP_KERNEL_ACCOUNT);
> +  obj = kmem_cache_alloc(base_cache, GFP_PGTABLE_USER);
> 
> For the single page cache topup allocation, KVM on x86 currently uses
> __get_free_page(GFP_KERNEL_ACCOUNT).  It seems to me that is equivalent
> to the above, except that the allocated page is not guaranteed to be
> zero (unless I missed the place where __get_free_page(!__GFP_ZERO) is
> still guaranteed to be zeroed.  It seems natural (and in fact desired)
> to have both topup functions implement the same expectations towards the
> caller, and we therefore move to GFP_PGTABLE_USER here as well.
> 
> This will make it easier to unify the memchace implementation between
> architectures.

Functionally, this looks correct (I haven't actually tested).  But, it
means that x86's shadow pages will be zeroed out twice, which could lead
to performance regressions.  The cache is also used for the gfns array,
and I'm pretty sure the gfns array is never zeroed out in the current code,
i.e. zeroing gfns would also introduce overhead.

The redudant zeroing of shadow pages could likely be addressed by removing
the call to clear_page() in kvm_mmu_get_page(), but I'd prefer not to go
that route because it doesn't address the gfns issue, means KVM pays the
cost of zeroing up front (as opposed to when a page is actually used), and
I have a future use case where the shadow page needs to be initialized to
a non-zero value.

What about having the common mmu_topup_memory_cache{_page}() take a GFP
param?  That would allow consolidating the bulk of the code while allowing
x86 to optimize its specific scenarios.

> Signed-off-by: Christoffer Dall <christoffer.dall@xxxxxxx>
> ---
>  arch/x86/kvm/mmu.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 24c23c66b226..540190cee3cb 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -40,6 +40,7 @@
>  
>  #include <asm/page.h>
>  #include <asm/pat.h>
> +#include <asm/pgalloc.h>
>  #include <asm/cmpxchg.h>
>  #include <asm/e820/api.h>
>  #include <asm/io.h>
> @@ -1025,7 +1026,7 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>  	if (cache->nobjs >= min)
>  		return 0;
>  	while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
> -		obj = kmem_cache_zalloc(base_cache, GFP_KERNEL_ACCOUNT);
> +		obj = kmem_cache_alloc(base_cache, GFP_PGTABLE_USER);
>  		if (!obj)
>  			return cache->nobjs >= min ? 0 : -ENOMEM;
>  		cache->objects[cache->nobjs++] = obj;
> @@ -1053,7 +1054,7 @@ static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
>  	if (cache->nobjs >= min)
>  		return 0;
>  	while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
> -		page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
> +		page = (void *)__get_free_page(GFP_PGTABLE_USER);
>  		if (!page)
>  			return cache->nobjs >= min ? 0 : -ENOMEM;
>  		cache->objects[cache->nobjs++] = page;
> -- 
> 2.18.0
>