On Sun Jun 11, 2023 at 5:29 AM AEST, Thomas Gleixner wrote: > On Thu, May 25 2023 at 13:52, Andrew Morton wrote: > > Replying here as I wasn't cc'ed on the patch. > > > @@ -1030,6 +1031,8 @@ static int take_cpu_down(void *_param) > > enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE); > > int err, cpu = smp_processor_id(); > > > > + idle_task_prepare_exit(); > > + > > /* Ensure this CPU doesn't handle any more interrupts. */ > > err = __cpu_disable(); > > if (err < 0) > > --- a/kernel/sched/core.c~lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown > > +++ a/kernel/sched/core.c > > @@ -9373,19 +9373,33 @@ void sched_setnuma(struct task_struct *p > > * Ensure that the idle task is using init_mm right before its CPU goes > > * offline. > > */ > > -void idle_task_exit(void) > > +void idle_task_prepare_exit(void) > > This function name along with the above comment is completely > misleading. It suggests this is about the idle task itself instead of > making it clear that this ensures that the kernel threads of the > outgoing CPU are not longer using a mm which is not init_mm. > > The callsite is arbitratily chosen too. Why does this have to be done > from stomp machine context? It's the minimalish fix. My patch didn't change what that idle_task_exit is attempting to do. > There is zero reason to do so. The last hotplug state before teardown is > CPUHP_AP_SCHED_WAIT_EMPTY. It invokes sched_cpu_wait_empty() in the > context of the CPU hotplug thread of the outgoing CPU. > > sched_cpu_wait_empty() guarantees that there are no temporarily pinned > (via migrate disable) user space tasks on that CPU anymore. The > scheduler guarantees that there won't be user space tasks woken up on or > migrated to that CPU because the CPU is not in the cpu_active mask. > > The stopper thread has absolutely nothing to do with that. > > So sched_cpu_wait_empty() is the obvious place to handle that: > > int sched_cpu_wait_empty(unsigned int cpu) > { > balance_hotplug_wait(); > + sched_force_init_mm(); > return 0; > } > > And then have: > > /* > * Invoked on the outgoing CPU in context of the CPU hotplug thread > * after ensuring that there are no user space tasks left on the CPU. > * > * If there is a lazy mm in use on the hotplug thread, drop it and > * switch to init_mm. > * > * The reference count on init_mm is dropped in finish_cpu(). > */ > static void sched_force_init_mm(void) > { > > No? It could be done in many places. Peter touched it last and it's been in the tree since prehistoric times. > > { > > struct mm_struct *mm = current->active_mm; > > > > - BUG_ON(cpu_online(smp_processor_id())); > > - BUG_ON(current != this_rq()->idle); > > + WARN_ON(!irqs_disabled()); > > > > if (mm != &init_mm) { > > - switch_mm(mm, &init_mm, current); > > + mmgrab_lazy_tlb(&init_mm); > > + current->active_mm = &init_mm; > > + switch_mm_irqs_off(mm, &init_mm, current); > > finish_arch_post_lock_switch(); > > + mmdrop_lazy_tlb(mm); > > } > > + /* finish_cpu() will mmdrop the init_mm ref after this CPU stops */ > > +} > > + > > +/* > > + * After the CPU is offline, double check that it was previously switched to > > + * init_mm. This call can be removed because the condition is caught in > > + * finish_cpu() as well. > > So why adding it in the first place? > > The changelog mumbles something about reducing churn, but I fail to see > that reduction. This adds 10 lines of pointless code and comments for > zero value. Not sure what you're talking about. The patch didn't add it. Removing it requires removing it from all archs, which is the churn. Thanks, Nick