On Thu, Jun 06, 2019 at 08:47:28PM -0700, John Hubbard wrote: > On 6/6/19 11:44 AM, Jason Gunthorpe wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > > > hmm_release() is called exactly once per hmm. ops->release() cannot > > accidentally trigger any action that would recurse back onto > > hmm->mirrors_sem. > > > > This fixes a use after-free race of the form: > > > > CPU0 CPU1 > > hmm_release() > > up_write(&hmm->mirrors_sem); > > hmm_mirror_unregister(mirror) > > down_write(&hmm->mirrors_sem); > > up_write(&hmm->mirrors_sem); > > kfree(mirror) > > mirror->ops->release(mirror) > > > > The only user we have today for ops->release is an empty function, so this > > is unambiguously safe. > > > > As a consequence of plugging this race drivers are not allowed to > > register/unregister mirrors from within a release op. > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > mm/hmm.c | 28 +++++++++------------------- > > 1 file changed, 9 insertions(+), 19 deletions(-) > > > > diff --git a/mm/hmm.c b/mm/hmm.c > > index 709d138dd49027..3a45dd3d778248 100644 > > +++ b/mm/hmm.c > > @@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) > > WARN_ON(!list_empty(&hmm->ranges)); > > mutex_unlock(&hmm->lock); > > > > - down_write(&hmm->mirrors_sem); > > - mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror, > > - list); > > - while (mirror) { > > - list_del_init(&mirror->list); > > - if (mirror->ops->release) { > > - /* > > - * Drop mirrors_sem so the release callback can wait > > - * on any pending work that might itself trigger a > > - * mmu_notifier callback and thus would deadlock with > > - * us. > > - */ > > - up_write(&hmm->mirrors_sem); > > + down_read(&hmm->mirrors_sem); > > This is cleaner and simpler, but I suspect it is leading to the deadlock > that Ralph Campbell is seeing in his driver testing. (And in general, holding > a lock during a driver callback usually leads to deadlocks.) I think Ralph has never seen this patch (it is new), so it must be one of the earlier patches.. > Ralph, is this the one? It's the only place in this patchset where I can > see a lock around a callback to driver code, that wasn't there before. So > I'm pretty sure it is the one... Can you share the lockdep report please? Thanks, Jason