----- On Aug 16, 2020, at 11:23 AM, Boqun Feng boqun.feng@xxxxxxxxx wrote: > Hi Mathieu, > > On Fri, Aug 14, 2020 at 12:43:56PM -0400, Mathieu Desnoyers wrote: >> exit_mm should issue memory barriers after user-space memory accesses, >> before clearing current->mm, to order user-space memory accesses >> performed prior to exit_mm before clearing tsk->mm, which has the >> effect of skipping the membarrier private expedited IPIs. >> >> The membarrier system call can be issued concurrently with do_exit >> if we have thread groups created with CLONE_VM but not CLONE_THREAD. >> >> Here is the scenario I have in mind: >> >> Two thread groups are created, A and B. Thread group B is created by >> issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD. >> Let's assume we have a single thread within each thread group (Thread A >> and Thread B). >> >> The AFAIU we can have: >> >> Userspace variables: >> >> int x = 0, y = 0; >> >> CPU 0 CPU 1 >> Thread A Thread B >> (in thread group A) (in thread group B) >> >> x = 1 >> barrier() >> y = 1 >> exit() >> exit_mm() >> current->mm = NULL; >> r1 = load y >> membarrier() >> skips CPU 0 (no IPI) because its current mm is NULL >> r2 = load x >> BUG_ON(r1 == 1 && r2 == 0) >> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> >> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> >> Cc: Will Deacon <will@xxxxxxxxxx> >> Cc: Paul E. McKenney <paulmck@xxxxxxxxxx> >> Cc: Nicholas Piggin <npiggin@xxxxxxxxx> >> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx> >> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> >> Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> >> Cc: linux-mm@xxxxxxxxx >> --- >> Changes since v1: >> - Use smp_mb__after_spinlock rather than smp_mb. >> - Document race scenario in commit message. >> --- >> kernel/exit.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/kernel/exit.c b/kernel/exit.c >> index 733e80f334e7..fe64e6e28dd5 100644 >> --- a/kernel/exit.c >> +++ b/kernel/exit.c >> @@ -475,6 +475,14 @@ static void exit_mm(void) >> BUG_ON(mm != current->active_mm); >> /* more a memory barrier than a real lock */ >> task_lock(current); >> + /* >> + * When a thread stops operating on an address space, the loop >> + * in membarrier_{private,global}_expedited() may not observe > > Is it accurate to say that the correctness of > membarrier_global_expedited() relies on the observation of ->mm? Because > IIUC membarrier_global_expedited() loop doesn't check ->mm. Good point, I was wrong. Will instead reword as: /* * When a thread stops operating on an address space, the loop * in membarrier_private_expedited() may not observe that * tsk->mm, and the loop in membarrier_global_expedited() may * not observe a MEMBARRIER_STATE_GLOBAL_EXPEDITED * rq->membarrier_state, so those would not issue an IPI. * Membarrier requires a memory barrier after accessing * user-space memory, before clearing tsk->mm or the * rq->membarrier_state. */ And I'll make sure exit_mm clears this_rq()->membarrier_state as well. Thanks, Mathieu > > Regards, > Boqun > >> + * that tsk->mm, and not issue an IPI. Membarrier requires a >> + * memory barrier after accessing user-space memory, before >> + * clearing tsk->mm. >> + */ >> + smp_mb__after_spinlock(); >> current->mm = NULL; >> mmap_read_unlock(mm); >> enter_lazy_tlb(mm, current); >> -- >> 2.11.0 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com