Re: Question/problem with mmu_notifier_unregister.

Robin Holt <holt@xxxxxxx> · Wed, 16 Jan 2013 14:00:18 -0600

On Tue, Jan 15, 2013 at 10:29:56AM -0600, Robin Holt wrote:
> Andrea,
> 
> On a community or upcoming distro based kernel, I have one XPMEM test
> which is failing.  The following patch (with lots of context) fixes it.
> I do not yet understand the cause of the race condition.  I did hope it
> would be obvious to you and save me the time of investigating further.
> 
> The first test finds mn->hlist is on the chain.  The new second test
> does not.  I have verified that neither xpmem nor gru is calling to
> unregister the same mmu_notifier twice.  It is very difficult to find the
> problem as the test case requires a very long time to trip.  To increase
> the likelihood, I run many copies in parallel.  Each is 2 processes each
> with one pthread.  I run two per socket on an 8 core per socket system
> with 256 socket system.  When one job hits the null pointer deref, I can
> have many thousands of lines of debug before the other jobs stop running.
> Each process has /dev/gru mapped into their address space.  They have
> used xpmem to attach part of the address space of the other process.
> The tasks are exiting at approximately the same time.  The race seems to
> be with one pthread hitting the final mmput at about the same time as the
> other task is getting into the filp_close->flush() callout.
> 
> I have a couple other things to work on today, and will likely try to
> chase this failure more tomorrow.  With this patch, I have not been able
> to trigger any other issues in many hours of testing.  The test likewise
> runs for many hours on a RHEL 6.3 based system and a SLES11 SP2 based
> system so it might have something to do with the change in srcu locking.

Using code inspection, I think I understand this issue some more.

Assume two tasks, one calling mmu_notifier_unregister() as a result
of a filp_close() ->flush() callout (task A), and the other calling
mmu_notifier_release() from an mmput() (task B).

		A				B
t1						srcu_read_lock()
t2		if (!hlist_unhashed())
t3		srcu_read_lock()
t4						srcu_read_unlock()
t5						hlist_del...()
t6						synchronize_srcu()
t7		srcu_read_unlock()
t8		hlist_del_rcu()  <--- NULL pointer deref.

Does this seem plausible?  Am I missing something?

My earlier patch feels wrong as I think we will see two ->release()
calls are made, but I am not sure of that.

Thanks,
Robin

> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 8a5ac8c..e2c9827 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -297,37 +299,38 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
>         if (!hlist_unhashed(&mn->hlist)) {
>                 /*
>                  * SRCU here will force exit_mmap to wait ->release to finish
>                  * before freeing the pages.
>                  */
>                 int id;
>  
>                 id = srcu_read_lock(&srcu);
>                 /*
>                  * exit_mmap will block in mmu_notifier_release to
>                  * guarantee ->release is called before freeing the
>                  * pages.
>                  */
>                 if (mn->ops->release)
>                         mn->ops->release(mn, mm);
>                 srcu_read_unlock(&srcu, id);
>  
>                 spin_lock(&mm->mmu_notifier_mm->lock);
> -               hlist_del_rcu(&mn->hlist);
> +               if (!hlist_unhashed(&mn->hlist))
> +                       hlist_del_rcu(&mn->hlist);
>                 spin_unlock(&mm->mmu_notifier_mm->lock);
>         }
>  
>         /*
>          * Wait any running method to finish, of course including
>          * ->release if it was run by mmu_notifier_relase instead of us.
>          */
>         synchronize_srcu(&srcu);
>  
>         BUG_ON(atomic_read(&mm->mm_count) <= 0);
>  
>         mmdrop(mm);
>  }
>  EXPORT_SYMBOL_GPL(mmu_notifier_unregister);
>  
>  static int __init mmu_notifier_init(void)
>  {
>         return init_srcu_struct(&srcu);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>