On Tue, Sep 04, 2018 at 03:12:28PM +0300, Leon Romanovsky wrote: > After releasing ucontext the __mmu_notifier_release will be called > again in exit_mmap path. However at that time the driver ucontext > (mlx5_ib_ucontext) already will be freed and it will cause > to use-after-free error, due to improper use of mmu_notifier API. I can't see how this fix is right, it needs more explanation if it is. The way it is supposed to work is that the umem *CANNOT* outlive the ucontext. When the umem is destroyed it deletes the mmu_notifier, and mmu_notifier_unregister is a hard fence (it calls hlist_del_init_rcu then synchronize_srcu). So once mmu_notifier_unregister() completes the notifier will not be seen by __mmu_notifier_release.. If all umems are destroyed then the hlist inside context-mn will be empty and __mmu_notifier_release will not touch ucontext anymore.. So, to me, this bug looks like either a umem struct survived the ucontext, which is blatantly wrong and needs to be fixed directly.. Or, perhaps, the umem did this flow: owning_mm = get_task_mm(owning_process); if (owning_mm == NULL) /* * The process' mm is already dead, notifier were * removed already. */ goto out_put_task; mmu_notifier_unregister(&context->mn, owning_mm); ie owning_mm == NULL, which skipped the unregister.. Leaving a dangling hlist reference in the mm if it hasn't already been cleaned up by something else. Since owning_mm == NULL very much does not imply that something else cleaned up the notifier, this is clearly buggy code, and the symptom would exactly match your trace. Having looked at it somewhat I think the fix here is to get rid of the completely wrong and stupid 'owning_process' thing *completely*. Instead hold on to the actual mm directly inside the umme via mmgrab() and mmdrop(), just like mmu_notifiers already does internally. Same thing for the other wrong code using context->tgid. Jason