Hi Mathieu, On Fri, Aug 14, 2020 at 12:43:56PM -0400, Mathieu Desnoyers wrote: > exit_mm should issue memory barriers after user-space memory accesses, > before clearing current->mm, to order user-space memory accesses > performed prior to exit_mm before clearing tsk->mm, which has the > effect of skipping the membarrier private expedited IPIs. > > The membarrier system call can be issued concurrently with do_exit > if we have thread groups created with CLONE_VM but not CLONE_THREAD. > > Here is the scenario I have in mind: > > Two thread groups are created, A and B. Thread group B is created by > issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD. > Let's assume we have a single thread within each thread group (Thread A > and Thread B). > > The AFAIU we can have: > > Userspace variables: > > int x = 0, y = 0; > > CPU 0 CPU 1 > Thread A Thread B > (in thread group A) (in thread group B) > > x = 1 > barrier() > y = 1 > exit() > exit_mm() > current->mm = NULL; > r1 = load y > membarrier() > skips CPU 0 (no IPI) because its current mm is NULL > r2 = load x > BUG_ON(r1 == 1 && r2 == 0) > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx> > Cc: Nicholas Piggin <npiggin@xxxxxxxxx> > Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> > Cc: linux-mm@xxxxxxxxx > --- > Changes since v1: > - Use smp_mb__after_spinlock rather than smp_mb. > - Document race scenario in commit message. > --- > kernel/exit.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/kernel/exit.c b/kernel/exit.c > index 733e80f334e7..fe64e6e28dd5 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -475,6 +475,14 @@ static void exit_mm(void) > BUG_ON(mm != current->active_mm); > /* more a memory barrier than a real lock */ > task_lock(current); > + /* > + * When a thread stops operating on an address space, the loop > + * in membarrier_{private,global}_expedited() may not observe Is it accurate to say that the correctness of membarrier_global_expedited() relies on the observation of ->mm? Because IIUC membarrier_global_expedited() loop doesn't check ->mm. Regards, Boqun > + * that tsk->mm, and not issue an IPI. Membarrier requires a > + * memory barrier after accessing user-space memory, before > + * clearing tsk->mm. > + */ > + smp_mb__after_spinlock(); > current->mm = NULL; > mmap_read_unlock(mm); > enter_lazy_tlb(mm, current); > -- > 2.11.0 >