On Mon, Jan 20, 2020 at 03:35:09PM -0500, Qian Cai wrote: > > On Jan 20, 2020, at 5:17 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > Bah.. that's horrible. Surely we can find a better place to do this in > > the whole hotplug machinery. > > > > Perhaps you can have takedown_cpu() do the mmdrop()? > > The problem is that no all arch_cpu_idle_dead() will call > idle_task_exit(). For example, alpha and parisc are not, so it needs How is that not broken? If the idle thread runs on an active_mm, we need to drop that reference. This needs fixing regardless. > to deal with some kind of ifdef dance in takedown_cpu() to > conditionally call mmdrop() which sounds even more horrible? > > If you really prefer it anyway, maybe something like touching every arch’s __cpu_die() to also call mmdrop() depends on arches? > > BTW, how to obtain the other CPU’s current task mm? Is that cpu_rq(cpu)->curr->active_mm? Something like this; except you'll need to go audit archs to make sure they all call idle_task_exit() and/or put in comments on why they don't have to (perhaps their bringup switches them to &init_mm unconditionally and the switch_mm() is not required). --- diff --git a/kernel/cpu.c b/kernel/cpu.c index 9c706af713fb..2b4d8a69e8d9 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -564,6 +564,23 @@ static int bringup_cpu(unsigned int cpu) return bringup_wait_for_ap(cpu); } +static int finish_cpu(unsigned int cpu) +{ + struct task_struct *idle = idle_thread_get(cpu); + struct mm_struct *mm = idle->active_mm; + + /* + * idle_task_exit() will have switched to &init_mm, now + * clean up any remaining active_mm state. + */ + + if (mm == &init_mm) + return; + + idle->active_mm = &init_mm; + mmdrop(mm); +} + /* * Hotplug state machine related functions */ @@ -1434,7 +1451,7 @@ static struct cpuhp_step cpuhp_hp_states[] = { [CPUHP_BRINGUP_CPU] = { .name = "cpu:bringup", .startup.single = bringup_cpu, - .teardown.single = NULL, + .teardown.single = finish_cpu, .cant_stop = true, }, /* Final state before CPU kills itself */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index fc1dfc007604..8f049fb77a3d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6188,13 +6188,14 @@ void idle_task_exit(void) struct mm_struct *mm = current->active_mm; BUG_ON(cpu_online(smp_processor_id())); + BUG_ON(current != this_rq()->idle); if (mm != &init_mm) { switch_mm(mm, &init_mm, current); - current->active_mm = &init_mm; finish_arch_post_lock_switch(); } - mmdrop(mm); + + /* finish_cpu(), as ran on the BP, will clean up the active_mm state */ } /*