On Tue 24-03-20 16:41:37, Jason Gunthorpe wrote: > On Fri, Feb 28, 2020 at 09:50:06AM -0400, Jason Gunthorpe wrote: > > On Tue, Feb 11, 2020 at 04:52:52PM -0400, Jason Gunthorpe wrote: > > > Many users of the mmu_notifier invalidate_range callbacks maintain > > > locking/counters/etc on a paired basis and have long expected that > > > invalidate_range_start/end() are always paired. > > > > > > For instance kvm_mmu_notifier_invalidate_range_end() undoes > > > kvm->mmu_notifier_count which was incremented during start(). > > > > > > The recent change to add non-blocking notifiers breaks this assumption > > > when multiple notifiers are present in the list. When EAGAIN is returned > > > from an invalidate_range_start() then no invalidate_range_ends() are > > > called, even if the subscription's start had previously been called. > > > > > > Unfortunately, due to the RCU list traversal we can't reliably generate a > > > subset of the linked list representing the notifiers already called to > > > generate an invalidate_range_end() pairing. > > > > > > One case works correctly, if only one subscription requires > > > invalidate_range_end() and it is the last entry in the hlist. In this > > > case, when invalidate_range_start() returns -EAGAIN there will be nothing > > > to unwind. > > > > > > Keep the notifier hlist sorted so that notifiers that require > > > invalidate_range_end() are always last, and if two are added then disable > > > non-blocking invalidation for the mm. > > > > > > A warning is printed for this case, if in future we determine this never > > > happens then we can simply fail during registration when there are > > > unsupported combinations of notifiers. > > > > > > Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") > > > Cc: Michal Hocko <mhocko@xxxxxxxx> > > > Cc: "Jérôme Glisse" <jglisse@xxxxxxxxxx> > > > Cc: Christoph Hellwig <hch@xxxxxx> > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > > mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++--- > > > 1 file changed, 50 insertions(+), 3 deletions(-) > > > > > > v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@xxxxxxxx/ > > > v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@xxxxxxxx/ > > > * Abandon attempting to fix it by calling invalidate_range_end() during an > > > EAGAIN start > > > * Just trivially ban multiple subscriptions > > > v3: > > > * Be more sophisticated, ban only multiple subscriptions if the result is > > > a failure. Allows multiple subscriptions without invalidate_range_end > > > * Include a printk when this condition is hit (Michal) > > > > > > At this point the rework Christoph requested during the first posting > > > is completed and there are now only 3 drivers using > > > invalidate_range_end(): > > > > > > drivers/misc/mic/scif/scif_dma.c: .invalidate_range_end = scif_mmu_notifier_invalidate_range_end}; > > > drivers/misc/sgi-gru/grutlbpurge.c: .invalidate_range_end = gru_invalidate_range_end, > > > virt/kvm/kvm_main.c: .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, > > > > > > While I think it is unlikely that any of these drivers will be used in > > > combination with each other, display a printk in hopes to check. > > > > > > Someday I expect to just fail the registration on this condition. > > > > > > I think this also addresses Michal's concern about a 'big hammer' as > > > it probably won't ever trigger now. > > > > I'm going to put this in linux-next to see if there are any reports of > > the pr_warn failing. > > > > Michal, are you happy with this solution now? > > It's been a month in linux-next now, with no complaints. If there are > no comments I will go ahead to send it in the hmm PR. I will not block this but it still looks like a wrong approach. A more robust solution would be to allow calling invalidate_range_end even for the failing invalidate_start. -- Michal Hocko SUSE Labs