x86 has a missing core serializing instruction in migration scenarios. Given that x86-32 can return to user-space with sysexit, and x86-64 through sysretq and sysretl, which are not core serializing, the following user-space self-modifiying code (JIT) scenario can occur: CPU 0 CPU 1 User-space self-modify code Preempted migrated -> scheduler selects task Return to user-space (iret or sysexit) User-space issues sync_core() <- migrated scheduler selects task Return to user-space (sysexit) jump to modified code Run modified code without sync_core() -> bug. This migration pattern can return to user-space through sysexit, sysretl, or sysretq, which are not core serializing, and therefore breaks sequential consistency expectations from a single-threaded process. Fix this issue by invoking sync_core_before_usermode() the first time a runqueue finishes a task switch after receiving a migrated thread. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx> CC: Andy Lutomirski <luto@xxxxxxxxxx> CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> CC: Boqun Feng <boqun.feng@xxxxxxxxx> CC: Andrew Hunter <ahh@xxxxxxxxxx> CC: Maged Michael <maged.michael@xxxxxxxxx> CC: Avi Kivity <avi@xxxxxxxxxxxx> CC: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> CC: Paul Mackerras <paulus@xxxxxxxxx> CC: Michael Ellerman <mpe@xxxxxxxxxxxxxx> CC: Dave Watson <davejwatson@xxxxxx> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx> CC: Ingo Molnar <mingo@xxxxxxxxxx> CC: "H. Peter Anvin" <hpa@xxxxxxxxx> CC: Andrea Parri <parri.andrea@xxxxxxxxx> CC: Russell King <linux@xxxxxxxxxxxxxxx> CC: Greg Hackmann <ghackmann@xxxxxxxxxx> CC: Will Deacon <will.deacon@xxxxxxx> CC: David Sehr <sehr@xxxxxxxxxx> CC: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> CC: x86@xxxxxxxxxx CC: linux-arch@xxxxxxxxxxxxxxx --- kernel/sched/core.c | 7 +++++++ kernel/sched/sched.h | 1 + 2 files changed, 8 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c79e94278613..4a1c9782267a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -927,6 +927,7 @@ static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf, rq_lock(rq, rf); BUG_ON(task_cpu(p) != new_cpu); + rq->need_sync_core = 1; enqueue_task(rq, p, 0); p->on_rq = TASK_ON_RQ_QUEUED; check_preempt_curr(rq, p, 0); @@ -2684,6 +2685,12 @@ static struct rq *finish_task_switch(struct task_struct *prev) prev_state = prev->state; vtime_task_switch(prev); perf_event_task_sched_in(prev, current); +#ifdef CONFIG_SMP + if (unlikely(rq->need_sync_core)) { + sync_core_before_usermode(); + rq->need_sync_core = 0; + } +#endif finish_lock_switch(rq, prev); finish_arch_post_lock_switch(); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cab256c1720a..33e617bc491c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -734,6 +734,7 @@ struct rq { /* For active balancing */ int active_balance; int push_cpu; + int need_sync_core; struct cpu_stop_work active_balance_work; /* cpu of this runqueue: */ int cpu; -- 2.11.0