On Sat, Apr 15, 2023 at 01:44:49AM +0200, Thomas Gleixner wrote: > Switch to the CPU hotplug core state tracking and synchronization > mechanim. No functional change intended. > > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx I gave this a spin on arm64 (in a 64-vCPU VM on an M1 host), and it seems to work fine with a bunch of vCPUs being hotplugged off and on again randomly. FWIW: Tested-by: Mark Rutland <mark.rutland@xxxxxxx> I also hacked the code to have the dying CPU spin forever before the call to cpuhp_ap_report_dead(). In that case I see a warning, and that we don't call arch_cpuhp_cleanup_dead_cpu(), and that the CPU is marked as offline (per /sys/devices/system/cpu/$N/online). As a tangent/aside, we might need to improve that for confidential compute architectures, and we might want to generically track cpus which might still be using kernel text/data. On arm64 we ensure that via our cpu_kill() callback (which'll use PSCI CPU_AFFINITY_INFO), but I'm not sure if TDX and/or SEV-SNP have a similar mechanism. Otherwise, a malicious hypervisor can pause a vCPU just before it leaves the kernel (e.g. immediately after the arch_cpuhp_cleanup_dead_cpu() call), wait for a kexec (or resuse of stack memroy), and unpause the vCPU to cause things to blow up. Thanks, Mark. > --- > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/smp.h | 2 +- > arch/arm64/kernel/smp.c | 14 +++++--------- > 3 files changed, 7 insertions(+), 10 deletions(-) > > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -216,6 +216,7 @@ config ARM64 > select HAVE_KPROBES > select HAVE_KRETPROBES > select HAVE_GENERIC_VDSO > + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU > select IRQ_DOMAIN > select IRQ_FORCED_THREADING > select KASAN_VMALLOC if KASAN > --- a/arch/arm64/include/asm/smp.h > +++ b/arch/arm64/include/asm/smp.h > @@ -99,7 +99,7 @@ static inline void arch_send_wakeup_ipi_ > > extern int __cpu_disable(void); > > -extern void __cpu_die(unsigned int cpu); > +static inline void __cpu_die(unsigned int cpu) { } > extern void cpu_die(void); > extern void cpu_die_early(void); > > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -333,17 +333,13 @@ static int op_cpu_kill(unsigned int cpu) > } > > /* > - * called on the thread which is asking for a CPU to be shutdown - > - * waits until shutdown has completed, or it is timed out. > + * Called on the thread which is asking for a CPU to be shutdown after the > + * shutdown completed. > */ > -void __cpu_die(unsigned int cpu) > +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) > { > int err; > > - if (!cpu_wait_death(cpu, 5)) { > - pr_crit("CPU%u: cpu didn't die\n", cpu); > - return; > - } > pr_debug("CPU%u: shutdown\n", cpu); > > /* > @@ -370,8 +366,8 @@ void cpu_die(void) > > local_daif_mask(); > > - /* Tell __cpu_die() that this CPU is now safe to dispose of */ > - (void)cpu_report_death(); > + /* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */ > + cpuhp_ap_report_dead(); > > /* > * Actually shutdown the CPU. This must never fail. The specific hotplug >