From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> [ Upstream commit 38cf307c1f2011d413750c5acb725456f47d9172 ] For SMP systems using IPI based TLB invalidation, looking at current->active_mm is entirely reasonable. This then presents the following race condition: CPU0 CPU1 flush_tlb_mm(mm) use_mm(mm) <send-IPI> tsk->active_mm = mm; <IPI> if (tsk->active_mm == mm) // flush TLBs </IPI> switch_mm(old_mm,mm,tsk); Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, because the IPI lands before we actually switched. Avoid this by disabling IRQs across changing ->active_mm and switch_mm(). Of the (SMP) architectures that have IPI based TLB invalidate: Alpha - checks active_mm ARC - ASID specific IA64 - checks active_mm MIPS - ASID specific flush OpenRISC - shoots down world PARISC - shoots down world SH - ASID specific SPARC - ASID specific x86 - N/A xtensa - checks active_mm So at the very least Alpha, IA64 and Xtensa are suspect. On top of this, for scheduler consistency we need at least preemption disabled across changing tsk->mm and doing switch_mm(), which is currently provided by task_lock(), but that's not sufficient for PREEMPT_RT. [akpm@xxxxxxxxxxxxxxxxxxxx: add comment] Reported-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Nicholas Piggin <npiggin@xxxxxxxxx> Cc: Jens Axboe <axboe@xxxxxxxxx> Cc: Kees Cook <keescook@xxxxxxxxxxxx> Cc: Jann Horn <jannh@xxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Link: http://lkml.kernel.org/r/20200721154106.GE10769@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> --- mm/mmu_context.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/mmu_context.c b/mm/mmu_context.c index 3e612ae748e96..a1da47e027479 100644 --- a/mm/mmu_context.c +++ b/mm/mmu_context.c @@ -25,13 +25,16 @@ void use_mm(struct mm_struct *mm) struct task_struct *tsk = current; task_lock(tsk); + /* Hold off tlb flush IPIs while switching mm's */ + local_irq_disable(); active_mm = tsk->active_mm; if (active_mm != mm) { mmgrab(mm); tsk->active_mm = mm; } tsk->mm = mm; - switch_mm(active_mm, mm, tsk); + switch_mm_irqs_off(active_mm, mm, tsk); + local_irq_enable(); task_unlock(tsk); #ifdef finish_arch_post_lock_switch finish_arch_post_lock_switch(); @@ -56,9 +59,11 @@ void unuse_mm(struct mm_struct *mm) task_lock(tsk); sync_mm_rss(mm); + local_irq_disable(); tsk->mm = NULL; /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); + local_irq_enable(); task_unlock(tsk); } EXPORT_SYMBOL_GPL(unuse_mm); -- 2.25.1