On Thu, Jan 18, 2024 at 09:22:37AM -0800, Sean Christopherson wrote: > On Fri, Jan 19, 2024, Xu Yilun wrote: > > On Tue, Jan 09, 2024 at 05:20:45PM -0800, Sean Christopherson wrote: > > > Retry page faults without acquiring mmu_lock if the resolved gfn is covered > > > by an active invalidation. Contending for mmu_lock is especially > > > problematic on preemptible kernels as the mmu_notifier invalidation task > > > will yield mmu_lock (see rwlock_needbreak()), delay the in-progress > > > > Is it possible fault-in task avoids contending mmu_lock by using _trylock()? > > Like: > > > > while (!read_trylock(&vcpu->kvm->mmu_lock)) > > cpu_relax(); > > > > if (is_page_fault_stale(vcpu, fault)) > > goto out_unlock; > > > > r = kvm_tdp_mmu_map(vcpu, fault); > > > > out_unlock: > > read_unlock(&vcpu->kvm->mmu_lock) > > It's definitely possible, but the downsides far outweigh any potential benefits. > > Doing trylock means the CPU is NOT put into the queue for acquiring the lock, > which means that forward progress isn't guaranteed. E.g. in a pathological > scenario (and by "pathological", I mean if NUMA balancing or KSM is active ;-)), > it's entirely possible for a near-endless stream of mmu_lock writers to be in > the queue, thus preventing the vCPU from acquiring mmu_lock in a timely manner. Ah yes, I forgot the main purpose of yielding is to let vCPU make forward progress when the fault-in page is not covered by the invalidation. Thanks, Yilun > > And hacking the page fault path to bypass KVM's lock contention detection would > be a very willful, deliberate violation of the intent of the MMU's yielding logic > for preemptible kernels. > > That said, I would love to revisit KVM's use of rwlock_needbreak(), at least in > the TDP MMU. As evidenced by this rash of issues, it's not at all obvious that > yielding on mmu_lock contention is *ever* a net positive for KVM, or even for the > kernel. The shadow MMU is probably a different story since it can't parallelize > page faults with other operations, e.g. yielding in kvm_zap_obsolete_pages() to > allow vCPUs to make forward progress is probably a net positive. > > But AFAIK, no one has done any testing to prove that yielding on contention in > the TDP MMU is actually a good thing. I'm 99% certain the only reason the TDP > MMU yields on contention is because the shadow MMU yields on contention, i.e. > I'm confident that no one ever did performance testing to shadow that there is > any benefit whatsoever to yielding mmu_lock in the TDP MMU. >