On Fri, Jan 12, 2024, Yuan Yao wrote: > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 3c844e428684..92f51540c4a7 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -4415,6 +4415,22 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, > > if (unlikely(!fault->slot)) > > return kvm_handle_noslot_fault(vcpu, fault, access); > > > > + /* > > + * Pre-check for a relevant mmu_notifier invalidation event prior to > > + * acquiring mmu_lock. If there is an in-progress invalidation and the > > + * kernel allows preemption, the invalidation task may drop mmu_lock > > + * and yield in response to mmu_lock being contended, which is *very* > > + * counter-productive as this vCPU can't actually make forward progress > > + * until the invalidation completes. This "unsafe" check can get false > > + * negatives, i.e. KVM needs to re-check after acquiring mmu_lock. Do > > + * the pre-check even for non-preemtible kernels, i.e. even if KVM will > > + * never yield mmu_lock in response to contention, as this vCPU ob > > + * *guaranteed* to need to retry, i.e. waiting until mmu_lock is held > > + * to detect retry guarantees the worst case latency for the vCPU. > > + */ > > + if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn)) > > + return RET_PF_RETRY; > > This breaks the contract of kvm_faultin_pfn(), i.e. the pfn's refcount > increased after resolved from gfn, but its caller won't decrease it. Oof, good catch. > How about call kvm_release_pfn_clean() just before return RET_PF_RETRY here, > so we don't need to duplicate it in 3 different places. Hrm, yeah, that does seem to be the best option. Thanks! > > + > > return RET_PF_CONTINUE; > > } > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 7e7fd25b09b3..179df96b20f8 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -2031,6 +2031,32 @@ static inline int mmu_invalidate_retry_gfn(struct kvm *kvm, > > return 1; > > return 0; > > } > > + > > +/* > > + * This lockless version of the range-based retry check *must* be paired with a > > s/lockess/lockless Heh, unless mine eyes deceive me, that's what I wrote :-)