On Tue, May 30, 2023, Sean Christopherson wrote: > On Tue, May 30, 2023, Sean Christopherson wrote: > > On Tue, May 30, 2023, Fabio Coatti wrote: > > > Il giorno dom 28 mag 2023 alle ore 14:44 Bagas Sanjaya > > > <bagasdotme@xxxxxxxxx> ha scritto: > > > > #regzbot ^introduced: v6.3.1..v6.3.2 > > > > #regzbot title: WARNING trace at kvm_nx_huge_page_recovery_worker when opening a new tab in Chrome > > > > > > Out of curiosity, I recompiled 6.3.4 after reverting the following > > > commit mentioned in 6.3.2 changelog: > > > > > > commit 2ec1fe292d6edb3bd112f900692d9ef292b1fa8b > > > Author: Sean Christopherson <seanjc@xxxxxxxxxx> > > > Date: Wed Apr 26 15:03:23 2023 -0700 > > > KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated > > > commit edbdb43fc96b11b3bfa531be306a1993d9fe89ec upstream. > > > > > > And the WARN message no longer appears on my host kernel logs, at > > > least so far :) > > > > Hmm, more than likely an NX shadow page is outliving a memslot update. I'll take > > another look at those flows to see if I can spot a race or leak. > > I didn't spot anything, and I couldn't reproduce the WARN even when dropping the > dirty logging requirement and hacking KVM to periodically delete memslots. Aha! Apparently my brain was just waiting until I sat down for dinner to have its lightbulb moment. The memslot lookup isn't factoring in whether the shadow page is for non-SMM versus SMM. QEMU configures SMM to have memslots that do not exist in the non-SMM world, so if kvm_recover_nx_huge_pages() encounters an SMM shadow page, the memslot lookup can fail to find a memslot because it looks only in the set of non-SMM memslots. Before commit 2ec1fe292d6e ("KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated"), KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages once all vCPUs exited SMM. That made the window where this bug could be encountered quite tiny, as the NX recovery thread would have to kick in while at least one vCPU was in SMM. QEMU VMs typically only use SMM during boot, and so the "bad" shadow pages were gone by the time the NX recovery thread ran. Now that KVM preserves TDP MMU roots until they are explicity invalidated (by a memslot deletion), the window to encounter the bug is effectively never closed because QEMU doesn't delete memslots after boot (except for a handful of special scenarios. Assuming I'm correct, this should fix the issue: --- arch/x86/kvm/mmu/mmu.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d3812de54b02..d5c03f14cdc7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7011,7 +7011,10 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm) */ slot = NULL; if (atomic_read(&kvm->nr_memslots_dirty_logging)) { - slot = gfn_to_memslot(kvm, sp->gfn); + struct kvm_memslots *slots; + + slots = kvm_memslots_for_spte_role(kvm, sp->role); + slot = __gfn_to_memslot(slots, sp->gfn); WARN_ON_ONCE(!slot); } base-commit: 17f2d782f18c9a49943ea723d7628da1837c9204 --