On Wed, Mar 10, 2021 at 01:31:17PM -0800, Sean Christopherson wrote: > Invoke the MMU notifier's .invalidate_range_end() callbacks even if one > of the .invalidate_range_start() callbacks failed. If there are multiple > notifiers, the notifier that did not fail may have performed actions in > its ...start() that it expects to unwind via ...end(). Per the > mmu_notifier_ops documentation, ...start() and ...end() must be paired. No this is not OK, if invalidate_start returns EBUSY invalidate_end should *not* be called. As you observed: > The only in-kernel usage that is fatally broken is the SGI UV GRU driver, > which effectively blocks and sleeps fault handlers during ...start(), and > unblocks/wakes the handlers during ...end(). But, the only users that > can fail ...start() are the i915 and Nouveau drivers, which are unlikely > to collide with the SGI driver. It used to be worse but I've since moved most of the other problematic users to the itree notifier which doesn't have the problem. > KVM is the only other user of ...end(), and while KVM also blocks fault > handlers in ...start(), the fault handlers do not sleep and originate in KVM will have its mmu_notifier_count become imbalanced: static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { kvm->mmu_notifier_count++; static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { kvm->mmu_notifier_count--; Which I believe is fatal to kvm? These notifiers certainly do not only happen at process exit. So, both of the remaining _end users become corrupted with this patch! I've tried to fix this before, the only thing that seems like it will work is to sort the hlist and only call ends that have succeeded their starts by comparing pointers with <. This is because the hlist can have items removed concurrently under SRCU so there is no easy way to compute the subset that succeeded in calling start. I had a prior effort to just ban more than 1 hlist notifier with end, but it turns out kvm on ARM uses two all the time (IIRC) > Found by inspection. Verified by adding a second notifier in KVM > that AFAIK it is a non-problem in real life because kvm is not mixed with notifier_start's that fail (and GRU is dead?). Everything else was fixed by moving to itree. Jason