On 5/11/22 16:51, Sean Christopherson wrote:
When zapping obsolete pages, update the running count of zapped pages
regardless of whether or not the list has become unstable due to zapping
a shadow page with its own child shadow pages. If the VM is backed by
mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping
the batch count and thus without yielding. In the worst case scenario,
this can cause a soft lokcup.
watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [dirty_log_perf_:13020]
RIP: 0010:workingset_activation+0x19/0x130
mark_page_accessed+0x266/0x2e0
kvm_set_pfn_accessed+0x31/0x40
mmu_spte_clear_track_bits+0x136/0x1c0
drop_spte+0x1a/0xc0
mmu_page_zap_pte+0xef/0x120
__kvm_mmu_prepare_zap_page+0x205/0x5e0
kvm_mmu_zap_all_fast+0xd7/0x190
kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10
kvm_page_track_flush_slot+0x5c/0x80
kvm_arch_flush_shadow_memslot+0xe/0x10
kvm_set_memslot+0x1a8/0x5d0
__kvm_set_memory_region+0x337/0x590
kvm_vm_ioctl+0xb08/0x1040
Fixes: fbb158cb88b6 ("KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""")
Reported-by: David Matlack <dmatlack@xxxxxxxxxx>
Reviewed-by: Ben Gardon <bgardon@xxxxxxxxxx>
Reviewed-by: David Matlack <dmatlack@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
v3:
- Collect David's review.
- "Rebase". The v2 patch still applies cleanly, but Paolo apparently has
a filter configured to ignore all emails related to the v2 submission.
v2:
- https://lore.kernel.org/all/20211129235233.1277558-1-seanjc@xxxxxxxxxx
- Rebase to kvm/master, commit 30d7c5d60a88 ("KVM: SEV: expose...")
- Collect Ben's review, modulo bad splat.
- Copy+paste the correct splat and symptom. [David].
arch/x86/kvm/mmu/mmu.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 909372762363..7429ae1784af 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5665,6 +5665,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
{
struct kvm_mmu_page *sp, *node;
int nr_zapped, batch = 0;
+ bool unstable;
restart:
list_for_each_entry_safe_reverse(sp, node,
@@ -5696,11 +5697,12 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
goto restart;
}
- if (__kvm_mmu_prepare_zap_page(kvm, sp,
- &kvm->arch.zapped_obsolete_pages, &nr_zapped)) {
- batch += nr_zapped;
+ unstable = __kvm_mmu_prepare_zap_page(kvm, sp,
+ &kvm->arch.zapped_obsolete_pages, &nr_zapped);
+ batch += nr_zapped;
+
+ if (unstable)
goto restart;
- }
}
/*
base-commit: 2764011106d0436cb44702cfb0981339d68c3509
Queued, thanks.
Paolo