With this patch applied, we are observing soft lockup and RCU stall issues on SNP guests with 128 vCPUs assigned and >=10GB guest memory allocations. >From the call stack dumps, it looks like migrate_pages() gets invoked for hugepages and triggers the MMU notifiers for invalidate_range_start() and correspondingly kvm_mmu_notifier_invalidate_range_start() invokes sev_guest_memory_reclaimed() which internally does wbinvd_on_all_cpus(). This can potentially cause long delays especially on large physical CPU count systems (the one we are testing on has 500 CPUs) and thus delay guest re-entry and cause soft-lockups and RCU stalls on the guest. Here are the kstack dumps of vCPU thread(s) invoking the invalidate_range_start() MMU notifiers: #1: [ 913.377780] CPU: 79 PID: 6538 Comm: qemu-system-x86 Not tainted 5.19.0-rc5-next-20220706-sev-es-snp+ #380 [ 913.377783] Hardware name: AMD Corporation QUARTZ/QUARTZ, BIOS RQZ1002C 09/15/2022 [ 913.377785] Call Trace: [ 913.377788] <TASK> [ 913.304300] sev_guest_memory_reclaimed.cold+0x18/0x22 [ 913.304303] kvm_arch_guest_memory_reclaimed+0x12/0x20 [ 913.304309] kvm_mmu_notifier_invalidate_range_start+0x2af/0x2e0 [ 913.304312] ? kvm_mmu_notifier_invalidate_range_end+0x101/0x1c0 [ 913.304314] __mmu_notifier_invalidate_range_start+0x83/0x190 [ 913.304320] try_to_migrate_one+0xba9/0xd80 [ 913.304326] rmap_walk_anon+0x166/0x360 [ 913.304329] rmap_walk+0x28/0x40 [ 913.304331] try_to_migrate+0x92/0xd0 [ 913.304334] ? try_to_unmap_one+0xe60/0xe60 [ 913.304336] ? anon_vma_ctor+0x50/0x50 [ 913.304339] ? page_get_anon_vma+0x80/0x80 [ 913.304341] ? invalid_mkclean_vma+0x20/0x20 [ 913.304343] migrate_pages+0x1276/0x1720 [ 913.304346] ? do_pages_stat+0x310/0x310 [ 913.304348] migrate_misplaced_page+0x5d0/0x820 [ 913.304351] do_huge_pmd_numa_page+0x1f7/0x4b0 [ 913.304354] __handle_mm_fault+0x66a/0x1040 [ 913.304358] handle_mm_fault+0xe4/0x2d0 [ 913.304361] __get_user_pages+0x1ea/0x710 [ 913.304363] get_user_pages_unlocked+0xd0/0x340 [ 913.304365] hva_to_pfn+0xf7/0x440 [ 913.304367] __gfn_to_pfn_memslot+0x7f/0xc0 [ 913.304369] kvm_faultin_pfn+0x95/0x280 [ 913.304373] direct_page_fault+0x201/0x800 [ 913.304375] kvm_tdp_page_fault+0x72/0x80 [ 913.304377] kvm_mmu_page_fault+0x136/0x710 [ 913.304379] ? kvm_complete_insn_gp+0x37/0x40 [ 913.304382] ? svm_complete_emulated_msr+0x52/0x60 [ 913.304384] ? kvm_emulate_wrmsr+0x6c/0x160 [ 913.304387] ? sev_handle_vmgexit+0x115a/0x1600 [ 913.304390] npf_interception+0x50/0xd0 [ 913.304391] svm_invoke_exit_handler+0xf5/0x130 [ 913.304394] svm_handle_exit+0x11c/0x230 [ 913.304396] vcpu_enter_guest+0x832/0x12e0 [ 913.304396] ? kvm_apic_local_deliver+0x6a/0x70 [ 913.304401] ? kvm_inject_apic_timer_irqs+0x2c/0x70 [ 913.304403] kvm_arch_vcpu_ioctl_run+0x105/0x680 #2: [ 913.378680] CPU: 79 PID: 6538 Comm: qemu-system-x86 Not tainted 5.19.0-rc5-next-20220706-sev-es-snp+ #380 [ 913.378683] Hardware name: AMD Corporation QUARTZ/QUARTZ, BIOS RQZ1002C 09/15/2022 [ 913.378685] Call Trace: [ 913.378687] <TASK> [ 913.378699] sev_guest_memory_reclaimed.cold+0x18/0x22 [ 913.378702] kvm_arch_guest_memory_reclaimed+0x12/0x20 [ 913.378707] kvm_mmu_notifier_invalidate_range_start+0x2af/0x2e0 [ 913.378711] __mmu_notifier_invalidate_range_start+0x83/0x190 [ 913.378715] change_protection+0x11ec/0x1420 [ 913.378720] ? kvm_release_pfn_clean+0x2f/0x40 [ 913.378722] change_prot_numa+0x66/0xb0 [ 913.378724] task_numa_work+0x22c/0x3b0 [ 913.378729] task_work_run+0x72/0xb0 [ 913.378732] xfer_to_guest_mode_handle_work+0xfc/0x100 [ 913.378738] kvm_arch_vcpu_ioctl_run+0x422/0x680 Additionally, it causes other vCPU threads handling #NPF to block as the above code path(s) are holding mm->mmap_lock, following are the kstack dumps of the blocked vCPU threads: [ 316.969254] task:qemu-system-x86 state:D stack: 0 pid: 6939 ppid: 6908 flags:0x00000000 [ 316.969256] Call Trace: [ 316.969257] <TASK> [ 316.969258] __schedule+0x350/0x900 [ 316.969262] schedule+0x52/0xb0 [ 316.969265] rwsem_down_read_slowpath+0x271/0x4b0 [ 316.969267] down_read+0x47/0xa0 [ 316.969269] get_user_pages_unlocked+0x6b/0x340 [ 316.969273] hva_to_pfn+0xf7/0x440 [ 316.969277] __gfn_to_pfn_memslot+0x7f/0xc0 [ 316.969279] kvm_faultin_pfn+0x95/0x280 [ 316.969283] ? kvm_apic_send_ipi+0x9c/0x100 [ 316.969287] direct_page_fault+0x201/0x800 [ 316.969290] kvm_tdp_page_fault+0x72/0x80 [ 316.969293] kvm_mmu_page_fault+0x136/0x710 [ 316.969296] ? xas_load+0x35/0x40 [ 316.969299] ? xas_find+0x187/0x1c0 [ 316.969301] ? xa_find_after+0xf1/0x110 [ 316.969304] ? kvm_pmu_trigger_event+0x5e/0x1e0 [ 316.969307] ? sysvec_call_function+0x52/0x90 [ 316.969310] npf_interception+0x50/0xd0 The invocation of migrate_pages() as in the following code path does not seem right: do_huge_pmd_numa_page migrate_misplaced_page migrate_pages as all the guest memory for SEV/SNP VMs will be pinned/locked, so why is the page migration code path getting invoked at all ? Thanks, Ashish