Re: [PATCH 2/2] KVM: x86/mmu: Recover NX Huge pages belonging to TDP MMU under MMU read lock

Vipin Sharma <vipinsh@xxxxxxxxxx> · Mon, 19 Aug 2024 10:34:53 -0700

On 2024-08-16 16:38:05, Sean Christopherson wrote:
> On Mon, Aug 12, 2024, Vipin Sharma wrote:
> > @@ -1831,12 +1845,17 @@ ulong kvm_tdp_mmu_recover_nx_huge_pages(struct kvm *kvm, ulong to_zap)
> >  		 * recovered, along with all the other huge pages in the slot,
> >  		 * when dirty logging is disabled.
> >  		 */
> > -		if (kvm_mmu_sp_dirty_logging_enabled(kvm, sp))
> > +		if (kvm_mmu_sp_dirty_logging_enabled(kvm, sp)) {
> > +			spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> >  			unaccount_nx_huge_page(kvm, sp);
> > -		else
> > -			flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
> > -
> > -		WARN_ON_ONCE(sp->nx_huge_page_disallowed);
> > +			spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> > +			to_zap--;
> > +			WARN_ON_ONCE(sp->nx_huge_page_disallowed);
> > +		} else if (tdp_mmu_zap_sp(kvm, sp)) {
> > +			flush = true;
> > +			to_zap--;
> 
> This is actively dangerous.  In the old code, tdp_mmu_zap_sp() could fail only
> in a WARN-able scenario, i.e. practice was guaranteed to succeed.  And, the
> for-loop *always* decremented to_zap, i.e. couldn't get stuck in an infinite
> loop.
> 
> Neither of those protections exist in this version.  Obviously it shouldn't happen,
> but it's possible this could flail on the same SP over and over, since nothing
> guarnatees forward progress.  The cond_resched() would save KVM from true pain,
> but it's still a wart in the implementation.
> 
> Rather than loop on to_zap, just do
> 
> 	list_for_each_entry(...) {
> 
> 		if (!to_zap)
> 			break;
> 	}
> 
> And if we don't use separate lists, that'll be an improvement too, as it KVM
> will only have to skip "wrong" shadow pages once, whereas this approach means
> every iteration of the loop has to walk past the "wrong" shadow pages.

TDP MMU accesses possible_nx_huge_pages using tdp_mmu_pages_lock spin
lock. We cannot hold it for recovery duration.

In this patch, tdp_mmu_zap_sp() has been modified to retry failures,
which is similar to other retry mechanism in TDP MMU. Won't it be the
same issue with other TDP MMU retry flows?

> 
> But I'd still prefer to use separate lists.