Re: [PATCH 5.4 1/1] KVM: SEV: add cache flush to solve SEV cache incoherency issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello All,

Here is a summary of all the discussions and options we have discussed
off-list :

For SNP guests we don't need to invoke the MMU invalidation notifiers and the cache flush should be done at the point of RMP ownership change instead of mmu_notifier, which will be when the unregister_enc_region ioctl is called, but as we don't trust the userspace (which can bypass this ioctl), therefore we continue to use the MMU invalidation notifiers. With UPM support added, we will be avoiding the RMP #PF split code path to split the host pagetable to be in sync with the RMP table entries, and therefore, mmu_notifier invoked from __split_huge_pmd() won’t be of concern.

For the MMU invalidation notifiers we are going to make two changes currently:

1). Use clflush/clflushopt instead of wbinvd_on_all_cpus() for range <= 2MB.

But this is not that straightforward, for SME_COHERENT platforms (Milan and beyond), clflush/clflushopt will flush guest tagged cache entries, but before Milan (!SME_COHERENT) we will need to use either VM_PAGE_FLUSH MSR or wbinvd to flush guest tagged cache entries. So for non SME_COHERENT platforms, there is no change and effectively no optimizations.

2). We also add the filtering in mmu_notifier (from Sean's patch) which invokes the mmu invalidation notifiers depending on the flag passed to the notifier. This will assist in reducing the overhead with NUMA balancing and especially eliminates the mmu_notifier invocations for the change_protection case.

Thanks,
Ashish


On 9/26/2022 7:37 PM, Sean Christopherson wrote:
On Tue, Sep 27, 2022, Ashish Kalra wrote:
With this patch applied, we are observing soft lockup and RCU stall issues on
SNP guests with 128 vCPUs assigned and >=10GB guest memory allocations.

...

 From the call stack dumps, it looks like migrate_pages() > The invocation of
migrate_pages() as in the following code path does not seem right:
do_huge_pmd_numa_page
       migrate_misplaced_page
         migrate_pages
as all the guest memory for SEV/SNP VMs will be pinned/locked, so why is the
page migration code path getting invoked at all ?

LOL, I feel your pain.  It's the wonderful NUMA autobalancing code.  It's been a
while since I looked at the code, but IIRC, it "works" by zapping PTEs for pages that
aren't allocated on the "right" node without checking if page migration is actually
possible.

The actual migration is done on the subsequent page fault.  In this case, the
balancer detects that the page can't be migrated and reinstalls the original PTE.

I don't know if using FOLL_LONGTERM would help?  Again, been a while.  The workaround
I've used in the past is to simply disable the balancer, e.g.

   CONFIG_NUMA_BALANCING=n

or
numa_balancing=disable

on the kernel command line.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux