On Fri, Sep 07, 2018 at 08:40:10PM +0300, Leon Romanovsky wrote: > On Thu, Sep 06, 2018 at 09:55:19PM -0600, Jason Gunthorpe wrote: > > On Wed, Sep 05, 2018 at 05:21:37PM -0600, Jason Gunthorpe wrote: > > > > > Instead hold on to the actual mm directly inside the umme via mmgrab() > > > and mmdrop(), just like mmu_notifiers already does internally. > > > > I coded up a series to do this, and more: > > > > https://github.com/jgunthorpe/linux/commits/tgid_removal > > > > I'll try to test it later, but it is the general idea.. ucontext->tgid > > is an abomination and needs to be deleted. > > > > Have to do some testing on it.. > > I tried the series with my repro for use-after-free bug in ODP plus reverted > commit "50704e039ab1 RDMA/umem: Restore lockdep check while downgrading lock" > just to be sure and got the following splat. I have similar lockdep > warning without reverting too. Ah, neat, I was just about to look at the ODP flow.. Hm.. The fact the old kernel and test doesn't trigger this probably explains the bug you are chasing. ib_umem_odp_release calls mmu_notifier_unregister under the umem_rwsem. If the notifier is still live unregister immediately calls ib_umem_notifier_release. The first thing the release function does is to also grab the rwsem! So this was never, ever, working properly. The only way to avoid this splat is to either never call unregister (woops!) or to only call unregister in cases where the mm has already been mput (which is not possible to guarentee at this point) Question is, is something wrong in my patch that a extra mmget() is floating about (preventing __exit_mmap), or did dropping off the tgid maddness do it.. Jason