On 2024-08-16 16:38:05, Sean Christopherson wrote: > On Mon, Aug 12, 2024, Vipin Sharma wrote: > > @@ -1831,12 +1845,17 @@ ulong kvm_tdp_mmu_recover_nx_huge_pages(struct kvm *kvm, ulong to_zap) > > * recovered, along with all the other huge pages in the slot, > > * when dirty logging is disabled. > > */ > > - if (kvm_mmu_sp_dirty_logging_enabled(kvm, sp)) > > + if (kvm_mmu_sp_dirty_logging_enabled(kvm, sp)) { > > + spin_lock(&kvm->arch.tdp_mmu_pages_lock); > > unaccount_nx_huge_page(kvm, sp); > > - else > > - flush |= kvm_tdp_mmu_zap_sp(kvm, sp); > > - > > - WARN_ON_ONCE(sp->nx_huge_page_disallowed); > > + spin_unlock(&kvm->arch.tdp_mmu_pages_lock); > > + to_zap--; > > + WARN_ON_ONCE(sp->nx_huge_page_disallowed); > > + } else if (tdp_mmu_zap_sp(kvm, sp)) { > > + flush = true; > > + to_zap--; > > This is actively dangerous. In the old code, tdp_mmu_zap_sp() could fail only > in a WARN-able scenario, i.e. practice was guaranteed to succeed. And, the > for-loop *always* decremented to_zap, i.e. couldn't get stuck in an infinite > loop. > > Neither of those protections exist in this version. Obviously it shouldn't happen, > but it's possible this could flail on the same SP over and over, since nothing > guarnatees forward progress. The cond_resched() would save KVM from true pain, > but it's still a wart in the implementation. > > Rather than loop on to_zap, just do > > list_for_each_entry(...) { > > if (!to_zap) > break; > } > > And if we don't use separate lists, that'll be an improvement too, as it KVM > will only have to skip "wrong" shadow pages once, whereas this approach means > every iteration of the loop has to walk past the "wrong" shadow pages. TDP MMU accesses possible_nx_huge_pages using tdp_mmu_pages_lock spin lock. We cannot hold it for recovery duration. In this patch, tdp_mmu_zap_sp() has been modified to retry failures, which is similar to other retry mechanism in TDP MMU. Won't it be the same issue with other TDP MMU retry flows? > > But I'd still prefer to use separate lists.