There are two places where core serialization is needed by membarrier: 1) When returning from the membarrier IPI, 2) After scheduler updates curr to a thread with a different mm, before going back to user-space, since the curr->mm is used by membarrier to check whether it needs to send an IPI to that CPU. x86-32 uses iret as return from interrupt, and both iret and sysexit to go back to user-space. The iret instruction is core serializing, but not sysexit. x86-64 uses iret as return from interrupt, which takes care of the IPI. However, it can return to user-space through either sysretl (compat code), sysretq, or iret. Given that sysret{l,q} is not core serializing, we rely instead on write_cr3() performed by switch_mm() to provide core serialization after changing the current mm, and deal with the special case of kthread -> uthread (temporarily keeping current mm into active_mm) by adding a sync_core() in that specific case. Use the new sync_core_before_usermode() to guarantee this. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx> CC: Andy Lutomirski <luto@xxxxxxxxxx> CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> CC: Boqun Feng <boqun.feng@xxxxxxxxx> CC: Andrew Hunter <ahh@xxxxxxxxxx> CC: Maged Michael <maged.michael@xxxxxxxxx> CC: Avi Kivity <avi@xxxxxxxxxxxx> CC: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> CC: Paul Mackerras <paulus@xxxxxxxxx> CC: Michael Ellerman <mpe@xxxxxxxxxxxxxx> CC: Dave Watson <davejwatson@xxxxxx> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx> CC: Ingo Molnar <mingo@xxxxxxxxxx> CC: "H. Peter Anvin" <hpa@xxxxxxxxx> CC: Andrea Parri <parri.andrea@xxxxxxxxx> CC: Russell King <linux@xxxxxxxxxxxxxxx> CC: Greg Hackmann <ghackmann@xxxxxxxxxx> CC: Will Deacon <will.deacon@xxxxxxx> CC: David Sehr <sehr@xxxxxxxxxx> CC: x86@xxxxxxxxxx CC: linux-arch@xxxxxxxxxxxxxxx --- Changes since v1: - Use the newly introduced sync_core_before_usermode(). Move all state handling to generic code. - Add linux/processor.h include to include/linux/sched/mm.h. Changes since v2: - Fix use-after-free in membarrier_mm_sync_core_before_usermode. Changes since v3: - Move generic code into separate patch. --- arch/x86/Kconfig | 1 + arch/x86/entry/entry_32.S | 5 +++++ arch/x86/entry/entry_64.S | 4 ++++ arch/x86/mm/tlb.c | 7 ++++--- 4 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0b44c8dd0e95..b5324f2e3162 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -54,6 +54,7 @@ config X86 select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV if X86_64 + select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_PMEM_API if X86_64 select ARCH_HAS_REFCOUNT select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64 diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index a1f28a54f23a..0c89cef690cf 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -554,6 +554,11 @@ restore_all: .Lrestore_nocheck: RESTORE_REGS 4 # skip orig_eax/error_code .Lirq_return: + /* + * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization + * when returning from IPI handler and when returning from + * scheduler to user-space. + */ INTERRUPT_RETURN .section .fixup, "ax" diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 4f8e1d35a97c..8a32390240f1 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -792,6 +792,10 @@ GLOBAL(restore_regs_and_return_to_kernel) POP_EXTRA_REGS POP_C_REGS addq $8, %rsp /* skip regs->orig_ax */ + /* + * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization + * when returning from IPI handler. + */ INTERRUPT_RETURN ENTRY(native_iret) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index c28cd5592b0d..df4e21371c89 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -201,9 +201,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, this_cpu_write(cpu_tlbstate.is_lazy, false); /* - * The membarrier system call requires a full memory barrier - * before returning to user-space, after storing to rq->curr. - * Writing to CR3 provides that full memory barrier. + * The membarrier system call requires a full memory barrier and + * core serialization before returning to user-space, after + * storing to rq->curr. Writing to CR3 provides that full + * memory barrier and core serializing instruction. */ if (real_prev == next) { VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) != -- 2.11.0