On Tue, Jan 15, 2013 at 10:29:56AM -0600, Robin Holt wrote: > Andrea, > > On a community or upcoming distro based kernel, I have one XPMEM test > which is failing. The following patch (with lots of context) fixes it. > I do not yet understand the cause of the race condition. I did hope it > would be obvious to you and save me the time of investigating further. > > The first test finds mn->hlist is on the chain. The new second test > does not. I have verified that neither xpmem nor gru is calling to > unregister the same mmu_notifier twice. It is very difficult to find the > problem as the test case requires a very long time to trip. To increase > the likelihood, I run many copies in parallel. Each is 2 processes each > with one pthread. I run two per socket on an 8 core per socket system > with 256 socket system. When one job hits the null pointer deref, I can > have many thousands of lines of debug before the other jobs stop running. > Each process has /dev/gru mapped into their address space. They have > used xpmem to attach part of the address space of the other process. > The tasks are exiting at approximately the same time. The race seems to > be with one pthread hitting the final mmput at about the same time as the > other task is getting into the filp_close->flush() callout. > > I have a couple other things to work on today, and will likely try to > chase this failure more tomorrow. With this patch, I have not been able > to trigger any other issues in many hours of testing. The test likewise > runs for many hours on a RHEL 6.3 based system and a SLES11 SP2 based > system so it might have something to do with the change in srcu locking. Using code inspection, I think I understand this issue some more. Assume two tasks, one calling mmu_notifier_unregister() as a result of a filp_close() ->flush() callout (task A), and the other calling mmu_notifier_release() from an mmput() (task B). A B t1 srcu_read_lock() t2 if (!hlist_unhashed()) t3 srcu_read_lock() t4 srcu_read_unlock() t5 hlist_del...() t6 synchronize_srcu() t7 srcu_read_unlock() t8 hlist_del_rcu() <--- NULL pointer deref. Does this seem plausible? Am I missing something? My earlier patch feels wrong as I think we will see two ->release() calls are made, but I am not sure of that. Thanks, Robin > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > index 8a5ac8c..e2c9827 100644 > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -297,37 +299,38 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) > if (!hlist_unhashed(&mn->hlist)) { > /* > * SRCU here will force exit_mmap to wait ->release to finish > * before freeing the pages. > */ > int id; > > id = srcu_read_lock(&srcu); > /* > * exit_mmap will block in mmu_notifier_release to > * guarantee ->release is called before freeing the > * pages. > */ > if (mn->ops->release) > mn->ops->release(mn, mm); > srcu_read_unlock(&srcu, id); > > spin_lock(&mm->mmu_notifier_mm->lock); > - hlist_del_rcu(&mn->hlist); > + if (!hlist_unhashed(&mn->hlist)) > + hlist_del_rcu(&mn->hlist); > spin_unlock(&mm->mmu_notifier_mm->lock); > } > > /* > * Wait any running method to finish, of course including > * ->release if it was run by mmu_notifier_relase instead of us. > */ > synchronize_srcu(&srcu); > > BUG_ON(atomic_read(&mm->mm_count) <= 0); > > mmdrop(mm); > } > EXPORT_SYMBOL_GPL(mmu_notifier_unregister); > > static int __init mmu_notifier_init(void) > { > return init_srcu_struct(&srcu); > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>