On Tue, Mar 21, 2023 at 07:40:02PM +0000, Usama Arif wrote: > From: David Woodhouse <dwmw@xxxxxxxxxxxx> > > Commit dce1ca0525bf ("sched/scs: Reset task stack state in bringup_cpu()") > ensured that the shadow call stack and KASAN poisoning were removed from > a CPU's stack each time that CPU is brought up, not just once. > > This is not incorrect. However, with parallel bringup, an architecture > may obtain the idle thread for a new CPU from a pre-bringup stage, by > calling idle_thread_get() for itself. This would mean that the cleanup > in bringup_cpu() would be too late. > > Move the SCS/KASAN cleanup to the generic _cpu_up() function instead, > which already ensures that the new CPU's stack is available, purely to > allow for early failure. This occurs when the CPU to be brought up is > in the CPUHP_OFFLINE state, which should correctly do the cleanup any > time the CPU has been taken down to the point where such is needed. > > Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx> This all sounds fine to me, and the patch itself looks good. I built an arm64 kernel with the first three patches from this series applied atop v6.3-rc3, with defconfig + CONFIG_SHADOW_CALL_STACK=y + CONFIG_KASAN_INLINE=y + CONFIG_KASAN_STACK=y. I then hotplugged a cpu with: while true; do echo 0 > /sys/devices/system/cpu/cpu1/online; echo 1 > /sys/devices/system/cpu/cpu1/online; done ... and that was perfectly happy to run for minutes with no unexpected failures. To make sure I wasn't avoiding issues by chance, I also tried with each of scs_task_reset() and kasan_unpoison_task_stack() commented out. With scs_task_reset() commented out, cpu re-onlining fails after a few iterations, and with kasan_unpoison_task_stack() commented out I get a KASAN splat upon the first re-onlining. So that all looks good. FWIW, for the first three patches: Reviewed-by: Mark Rutland <mark.rutland@xxxxxxx> Tested-by: Mark Rutland <mark.rutland@xxxxxxx> [arm64] Mark. > --- > kernel/cpu.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 6c0a92ca6bb5..43e0a77f21e8 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -591,12 +591,6 @@ static int bringup_cpu(unsigned int cpu) > struct task_struct *idle = idle_thread_get(cpu); > int ret; > > - /* > - * Reset stale stack state from the last time this CPU was online. > - */ > - scs_task_reset(idle); > - kasan_unpoison_task_stack(idle); > - > /* > * Some architectures have to walk the irq descriptors to > * setup the vector space for the cpu which comes online. > @@ -1383,6 +1377,12 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target) > ret = PTR_ERR(idle); > goto out; > } > + > + /* > + * Reset stale stack state from the last time this CPU was online. > + */ > + scs_task_reset(idle); > + kasan_unpoison_task_stack(idle); > } > > cpuhp_tasks_frozen = tasks_frozen; > -- > 2.25.1 >