exit_mm should issue memory barriers after user-space memory accesses, before clearing current->mm, to order user-space memory accesses performed prior to exit_mm before clearing tsk->mm, which has the effect of skipping the membarrier private expedited IPIs. The membarrier system call can be issued concurrently with do_exit if we have thread groups created with CLONE_VM but not CLONE_THREAD. The following comment in exit_mm() seems rather puzzling though: /* more a memory barrier than a real lock */ task_lock(current); So considering that spinlocks are not full memory barriers nowadays, some digging into the origins of this comment led me to 2002, at the earliest commit tracked by Thomas Gleixner's bitkeeper era's history at https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/ At that point, this comment was followed by: + /* more a memory barrier than a real lock */ + task_lock(tsk); + tsk->mm = NULL; + task_unlock(tsk); Which seems to indicate that grabbing the lock is really about acting as a memory barrier, but I wonder if it is meant to be a full barrier or just an acquire. If a full memory barrier happens to be needed even without membarrier, perhaps this fix also corrects other issues ? It unclear from the comment what this memory barrier orders though. Is it the chain exit -> waitpid, or is it related to entering lazy TLB ? Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Paul E. McKenney <paulmck@xxxxxxxxxx> Cc: Nicholas Piggin <npiggin@xxxxxxxxx> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> Cc: linux-mm@xxxxxxxxx --- kernel/exit.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/exit.c b/kernel/exit.c index 727150f28103..ce272ec55cdc 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -474,6 +474,14 @@ static void exit_mm(void) BUG_ON(mm != current->active_mm); /* more a memory barrier than a real lock */ task_lock(current); + /* + * When a thread stops operating on an address space, the loop + * in membarrier_{private,global}_expedited() may not observe + * that tsk->mm, and not issue an IPI. Membarrier requires a + * memory barrier after accessing user-space memory, before + * clearing tsk->mm. + */ + smp_mb(); current->mm = NULL; mmap_read_unlock(mm); enter_lazy_tlb(mm, current); -- 2.11.0