----- On Jan 8, 2022, at 11:43 AM, Andy Lutomirski luto@xxxxxxxxxx wrote: > membarrier() requires a barrier before changes to rq->curr->mm, not just > before writes to rq->membarrier_state. Move the barrier in exec_mmap() to > the right place. I don't see anything that was technically wrong with membarrier_exec_mmap before this patchset. membarrier_exec-mmap issued a smp_mb just after the task_lock(), and proceeded to clear the mm->membarrier_state and runqueue membarrier state. And then the tsk->mm is set *after* the smp_mb(). So from this commit message we could be led to think there was something wrong before, but I do not think it's true. This first part of the proposed change is merely a performance optimization that removes a useless memory barrier on architectures where smp_mb__after_spinlock() is a no-op, and removes a useless store to mm->membarrier_state because it is already zero-initialized. This is all very nice, but does not belong in a "Fix" patch. > Add the barrier in kthread_use_mm() -- it was entirely > missing before. This is correct. This second part of the patch is indeed a relevant fix. Thanks, Mathieu > > This patch makes exec_mmap() and kthread_use_mm() use the same membarrier > hooks, which results in some code deletion. > > As an added bonus, this will eliminate a redundant barrier in execve() on > arches for which spinlock acquisition is a barrier. > > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> > --- > fs/exec.c | 6 +++++- > include/linux/sched/mm.h | 2 -- > kernel/kthread.c | 5 +++++ > kernel/sched/membarrier.c | 15 --------------- > 4 files changed, 10 insertions(+), 18 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 38b05e01c5bd..325dab98bc51 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1001,12 +1001,16 @@ static int exec_mmap(struct mm_struct *mm) > } > > task_lock(tsk); > - membarrier_exec_mmap(mm); > + /* > + * membarrier() requires a full barrier before switching mm. > + */ > + smp_mb__after_spinlock(); > > local_irq_disable(); > active_mm = tsk->active_mm; > tsk->active_mm = mm; > WRITE_ONCE(tsk->mm, mm); /* membarrier reads this without locks */ > + membarrier_update_current_mm(mm); > /* > * This prevents preemption while active_mm is being loaded and > * it and mm are being updated, which could cause problems for > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index e107f292fc42..f1d2beac464c 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -344,8 +344,6 @@ enum { > #include <asm/membarrier.h> > #endif > > -extern void membarrier_exec_mmap(struct mm_struct *mm); > - > extern void membarrier_update_current_mm(struct mm_struct *next_mm); > > /* > diff --git a/kernel/kthread.c b/kernel/kthread.c > index 3b18329f885c..18b0a2e0e3b2 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -1351,6 +1351,11 @@ void kthread_use_mm(struct mm_struct *mm) > WARN_ON_ONCE(tsk->mm); > > task_lock(tsk); > + /* > + * membarrier() requires a full barrier before switching mm. > + */ > + smp_mb__after_spinlock(); > + > /* Hold off tlb flush IPIs while switching mm's */ > local_irq_disable(); > active_mm = tsk->active_mm; > diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c > index c38014c2ed66..44fafa6e1efd 100644 > --- a/kernel/sched/membarrier.c > +++ b/kernel/sched/membarrier.c > @@ -277,21 +277,6 @@ static void ipi_sync_rq_state(void *info) > smp_mb(); > } > > -void membarrier_exec_mmap(struct mm_struct *mm) > -{ > - /* > - * Issue a memory barrier before clearing membarrier_state to > - * guarantee that no memory access prior to exec is reordered after > - * clearing this state. > - */ > - smp_mb(); > - /* > - * Keep the runqueue membarrier_state in sync with this mm > - * membarrier_state. > - */ > - this_cpu_write(runqueues.membarrier_state, 0); > -} > - > void membarrier_update_current_mm(struct mm_struct *next_mm) > { > struct rq *rq = this_rq(); > -- > 2.33.1 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com