https://bugzilla.kernel.org/show_bug.cgi?id=217562 --- Comment #2 from Arnaud Lefebvre (arnaud.lefebvre@xxxxxxxxxxxxxxxx) --- Thanks a lot for that very detailed reply! > TL;DR: I'm 99% certain you're hitting a race that results in KVM doing a > list_del() > before a list_add(). I am planning on sending a patch for v5.15 to disable > the > TDP MMU by default, which will "fix" this bug, but I have an extra long > weekend > and won't get to that before next Thursday or so. > In the meantime, you can effect the same fix by disabling the TDP MMU via > module > param, i.e. add kvm.tdp_mmu=false to your kernel/KVM command line. Alright, thanks for the tip. We'll probably just upgrade to the 6.1 LTS, this was planned but we weren't sure if the bug were there too. > If you're feeling particularly masochistic, I bet you could reproduce this > more > easily by introducing a delay between setting the SPTE and linking the page, > e.g. > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 6c2bb60ccd88..1fb10d4156aa 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -1071,6 +1071,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, > u32 error_code, > !shadow_accessed_mask); > > if (tdp_mmu_set_spte_atomic_no_dirty_log(vcpu->kvm, > &iter, new_spte)) { > + udelay(100); > tdp_mmu_link_page(vcpu->kvm, sp, > huge_page_disallowed && > req_level >= iter.level); We might try that if we can find some time in the upcoming weeks, just to be sure that we can actually reproduce the bug and put this behind us. Regarding this bug report, how do we proceed from now on? Should we close it? Keep it open for a few weeks until we can confirm that we don't have this issue in 6.1 anymore? Let you handle it once you disable TDP MMU by default on the v5.15 LTS? Thanks for your advice. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.