On Tue, Apr 23, 2024 at 6:30 AM Peter Gonda <pgonda@xxxxxxxxxx> wrote: > > On Mon, Apr 22, 2024 at 6:29 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > +Tom, Mike, and Peter > > > > On Thu, Apr 18, 2024, Li RongQing wrote: > > > save_area of per-CPU svm_data are dominantly accessed from their > > > own local CPUs, so allocate them node-local for performance reason > > > > > > Signed-off-by: Li RongQing <lirongqing@xxxxxxxxx> > > > --- > > > arch/x86/kvm/svm/sev.c | 6 +++--- > > > arch/x86/kvm/svm/svm.c | 2 +- > > > arch/x86/kvm/svm/svm.h | 6 +++++- > > > 3 files changed, 9 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > > > index 61a7531..cce8ec7 100644 > > > --- a/arch/x86/kvm/svm/sev.c > > > +++ b/arch/x86/kvm/svm/sev.c > > > @@ -3179,13 +3179,13 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector) > > > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1); > > > } > > > > > > -struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) > > > +struct page *snp_safe_alloc_page_node(struct kvm_vcpu *vcpu, int node) > > > { > > > unsigned long pfn; > > > struct page *p; > > > > > > if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) > > > - return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); > > > + return alloc_pages_node(node, GFP_KERNEL_ACCOUNT | __GFP_ZERO, 0); > > > > > > /* > > > * Allocate an SNP-safe page to workaround the SNP erratum where > > > @@ -3196,7 +3196,7 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) > > > * Allocate one extra page, choose a page which is not > > > * 2MB-aligned, and free the other. > > > */ > > > - p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1); > > > + p = alloc_pages_node(node, GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1); > > > > This made me realize the existing code is buggy. The allocation for the per-CPU > > save area shouldn't be accounted. > > > > Also, what's the point of taking @vcpu? It's a nice enough flag to say "this > > should be accounted", but it's decidely odd. > > > > How about we fix both in a single series, and end up with this over 3-4 patches? > > I.e. pass -1 where vcpu is non-NULL, and the current CPU for the save area. > > Looks good to me. Internally we already use GFP_KERNEL for these > allocations. But we had an issue with split_page() and memcg > accounting internally. Yosry submitted the following: > > + if (memcg_kmem_charge(p, GFP_KERNEL_ACCOUNT, 0)) { > + __free_page(p); > + return NULL; > + } > > Not sure if this is an issue with our kernel or if we should use > split_page_memcg() here? It was suggested internally but we didn't > want to backport it. The referenced internal code is in an older kernel where split_page() was buggy and did not handle memcg charging correctly (did not call split_page_memcg()). So we needed to make the allocation with GFP_KERNEL, then manually account the used page after splitting and freeing the unneeded page. AFAICT, this is not needed upstream and the current code is fine. > > > > > struct page *snp_safe_alloc_page(int cpu) > > { > > unsigned long pfn; > > struct page *p; > > gfp_t gpf; > > int node; > > > > if (cpu >= 0) { > > node = cpu_to_node(cpu); > > gfp = GFP_KERNEL; > > } else { > > node = NUMA_NO_NODE; > > gfp = GFP_KERNEL_ACCOUNT > > } FWIW, from the pov of someone who has zero familiarity with this code, passing @cpu only to make inferences about the GFP flags and numa nodes is confusing tbh. Would it be clearer to pass in the gfp flags and node id directly? I think it would make it clearer why we choose to account the allocation and/or specify a node in some cases but not others. > > gfp |= __GFP_ZERO; > > > > if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) > > return alloc_pages_node(node, gfp, 0); > > > > /* > > * Allocate an SNP-safe page to workaround the SNP erratum where > > * the CPU will incorrectly signal an RMP violation #PF if a > > * hugepage (2MB or 1GB) collides with the RMP entry of a > > * 2MB-aligned VMCB, VMSA, or AVIC backing page. > > * > > * Allocate one extra page, choose a page which is not > > * 2MB-aligned, and free the other. > > */ > > p = alloc_pages_node(node, gfp, 1); > > if (!p) > > return NULL; > > > > split_page(p, 1); > > > > pfn = page_to_pfn(p); > > if (IS_ALIGNED(pfn, PTRS_PER_PMD)) > > __free_page(p++); > > else > > __free_page(p + 1); > > > > return p; > > } > >