Re: [PATCH] KVM: SVM: Consider NUMA affinity when allocating per-CPU save_area

Peter Gonda <pgonda@xxxxxxxxxx> · Tue, 23 Apr 2024 07:30:08 -0600

On Mon, Apr 22, 2024 at 6:29 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> +Tom, Mike, and Peter
>
> On Thu, Apr 18, 2024, Li RongQing wrote:
> > save_area of per-CPU svm_data are dominantly accessed from their
> > own local CPUs, so allocate them node-local for performance reason
> >
> > Signed-off-by: Li RongQing <lirongqing@xxxxxxxxx>
> > ---
> >  arch/x86/kvm/svm/sev.c | 6 +++---
> >  arch/x86/kvm/svm/svm.c | 2 +-
> >  arch/x86/kvm/svm/svm.h | 6 +++++-
> >  3 files changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 61a7531..cce8ec7 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3179,13 +3179,13 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> >       ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
> >  }
> >
> > -struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> > +struct page *snp_safe_alloc_page_node(struct kvm_vcpu *vcpu, int node)
> >  {
> >       unsigned long pfn;
> >       struct page *p;
> >
> >       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
> > -             return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> > +             return alloc_pages_node(node, GFP_KERNEL_ACCOUNT | __GFP_ZERO, 0);
> >
> >       /*
> >        * Allocate an SNP-safe page to workaround the SNP erratum where
> > @@ -3196,7 +3196,7 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> >        * Allocate one extra page, choose a page which is not
> >        * 2MB-aligned, and free the other.
> >        */
> > -     p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> > +     p = alloc_pages_node(node, GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
>
> This made me realize the existing code is buggy.  The allocation for the per-CPU
> save area shouldn't be accounted.
>
> Also, what's the point of taking @vcpu?  It's a nice enough flag to say "this
> should be accounted", but it's decidely odd.
>
> How about we fix both in a single series, and end up with this over 3-4 patches?
> I.e. pass -1 where vcpu is non-NULL, and the current CPU for the save area.

Looks good to me. Internally we already use GFP_KERNEL for these
allocations. But we had an issue with split_page() and memcg
accounting internally. Yosry submitted the following:

+  if (memcg_kmem_charge(p, GFP_KERNEL_ACCOUNT, 0)) {
+    __free_page(p);
+    return NULL;
+  }

Not sure if this is an issue with our kernel or if we should use
split_page_memcg() here? It was suggested internally but we didn't
want to backport it.

>
> struct page *snp_safe_alloc_page(int cpu)
> {
>         unsigned long pfn;
>         struct page *p;
>         gfp_t gpf;
>         int node;
>
>         if (cpu >= 0) {
>                 node = cpu_to_node(cpu);
>                 gfp = GFP_KERNEL;
>         } else {
>                 node = NUMA_NO_NODE;
>                 gfp = GFP_KERNEL_ACCOUNT
>         }
>         gfp |= __GFP_ZERO;
>
>         if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
>                 return alloc_pages_node(node, gfp, 0);
>
>         /*
>          * Allocate an SNP-safe page to workaround the SNP erratum where
>          * the CPU will incorrectly signal an RMP violation #PF if a
>          * hugepage (2MB or 1GB) collides with the RMP entry of a
>          * 2MB-aligned VMCB, VMSA, or AVIC backing page.
>          *
>          * Allocate one extra page, choose a page which is not
>          * 2MB-aligned, and free the other.
>          */
>         p = alloc_pages_node(node, gfp, 1);
>         if (!p)
>                 return NULL;
>
>         split_page(p, 1);
>
>         pfn = page_to_pfn(p);
>         if (IS_ALIGNED(pfn, PTRS_PER_PMD))
>                 __free_page(p++);
>         else
>                 __free_page(p + 1);
>
>         return p;
> }
>