On 6/6/19 11:44 AM, Jason Gunthorpe wrote: > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > hmm_release() is called exactly once per hmm. ops->release() cannot > accidentally trigger any action that would recurse back onto > hmm->mirrors_sem. > > This fixes a use after-free race of the form: > > CPU0 CPU1 > hmm_release() > up_write(&hmm->mirrors_sem); > hmm_mirror_unregister(mirror) > down_write(&hmm->mirrors_sem); > up_write(&hmm->mirrors_sem); > kfree(mirror) > mirror->ops->release(mirror) > > The only user we have today for ops->release is an empty function, so this > is unambiguously safe. > > As a consequence of plugging this race drivers are not allowed to > register/unregister mirrors from within a release op. > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > --- > mm/hmm.c | 28 +++++++++------------------- > 1 file changed, 9 insertions(+), 19 deletions(-) > > diff --git a/mm/hmm.c b/mm/hmm.c > index 709d138dd49027..3a45dd3d778248 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) > WARN_ON(!list_empty(&hmm->ranges)); > mutex_unlock(&hmm->lock); > > - down_write(&hmm->mirrors_sem); > - mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror, > - list); > - while (mirror) { > - list_del_init(&mirror->list); > - if (mirror->ops->release) { > - /* > - * Drop mirrors_sem so the release callback can wait > - * on any pending work that might itself trigger a > - * mmu_notifier callback and thus would deadlock with > - * us. > - */ > - up_write(&hmm->mirrors_sem); > + down_read(&hmm->mirrors_sem); This is cleaner and simpler, but I suspect it is leading to the deadlock that Ralph Campbell is seeing in his driver testing. (And in general, holding a lock during a driver callback usually leads to deadlocks.) Ralph, is this the one? It's the only place in this patchset where I can see a lock around a callback to driver code, that wasn't there before. So I'm pretty sure it is the one... thanks, -- John Hubbard NVIDIA