On Thu, Jun 13, 2019 at 09:44:50PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > If the trylock on the hmm->mirrors_sem fails the function will return > without decrementing the notifiers that were previously incremented. Since > the caller will not call invalidate_range_end() on EAGAIN this will result > in notifiers becoming permanently incremented and deadlock. > > If the sync_cpu_device_pagetables() required blocking the function will > not return EAGAIN even though the device continues to touch the > pages. This is a violation of the mmu notifier contract. > > Switch, and rename, the ranges_lock to a spin lock so we can reliably > obtain it without blocking during error unwind. > > The error unwind is necessary since the notifiers count must be held > incremented across the call to sync_cpu_device_pagetables() as we cannot > allow the range to become marked valid by a parallel > invalidate_start/end() pair while doing sync_cpu_device_pagetables(). > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > Reviewed-by: Ralph Campbell <rcampbell@xxxxxxxxxx> > Tested-by: Philip Yang <Philip.Yang@xxxxxxx> > --- > include/linux/hmm.h | 2 +- > mm/hmm.c | 77 +++++++++++++++++++++++++++------------------ > 2 files changed, 48 insertions(+), 31 deletions(-) > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > index bf013e96525771..0fa8ea34ccef6d 100644 > --- a/include/linux/hmm.h > +++ b/include/linux/hmm.h > @@ -86,7 +86,7 @@ > struct hmm { > struct mm_struct *mm; > struct kref kref; > - struct mutex lock; > + spinlock_t ranges_lock; > struct list_head ranges; > struct list_head mirrors; > struct mmu_notifier mmu_notifier; > diff --git a/mm/hmm.c b/mm/hmm.c > index c0d43302fd6b2f..1172a4f0206963 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -67,7 +67,7 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm) > init_rwsem(&hmm->mirrors_sem); > hmm->mmu_notifier.ops = NULL; > INIT_LIST_HEAD(&hmm->ranges); > - mutex_init(&hmm->lock); > + spin_lock_init(&hmm->ranges_lock); > kref_init(&hmm->kref); > hmm->notifiers = 0; > hmm->mm = mm; > @@ -124,18 +124,19 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) > { > struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier); > struct hmm_mirror *mirror; > + unsigned long flags; > > /* Bail out if hmm is in the process of being freed */ > if (!kref_get_unless_zero(&hmm->kref)) > return; > > - mutex_lock(&hmm->lock); > + spin_lock_irqsave(&hmm->ranges_lock, flags); > /* > * Since hmm_range_register() holds the mmget() lock hmm_release() is > * prevented as long as a range exists. > */ > WARN_ON(!list_empty(&hmm->ranges)); > - mutex_unlock(&hmm->lock); > + spin_unlock_irqrestore(&hmm->ranges_lock, flags); > > down_read(&hmm->mirrors_sem); > list_for_each_entry(mirror, &hmm->mirrors, list) { > @@ -151,6 +152,23 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) > hmm_put(hmm); > } > > +static void notifiers_decrement(struct hmm *hmm) > +{ > + lockdep_assert_held(&hmm->ranges_lock); > + > + hmm->notifiers--; > + if (!hmm->notifiers) { Nitpick, when doing dec and test or inc and test ops I find it much easier to read if they are merged into one line, i.e. if (!--hmm->notifiers) { Otherwise this looks fine: Reviewed-by: Christoph Hellwig <hch@xxxxxx>