On Tue, Jul 17, 2018 at 02:58:14PM +0200, Geert Uytterhoeven wrote: > Hi Sudeep, > > On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@xxxxxxx> wrote: > > We already repopulate the information on CPU hotplug-in, so we can safely > > remove the CPU topology and NUMA cpumap information during CPU hotplug > > out operation. This will help to provide the correct cpumask for > > scheduler domains. > > > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > > Cc: Will Deacon <will.deacon@xxxxxxx> > > Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@xxxxxxxxxx> > > Tested-by: Hanjun Guo <hanjun.guo@xxxxxxxxxx> > > Signed-off-by: Sudeep Holla <sudeep.holla@xxxxxxx> > > This is now commit 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa > topology information when hotplugging out CPU") in arm64/for-next/core, to > which I bisected a PSCI checker regression on systems with two CPU clusters. > > Dmesg on R-Car H3 (4xCA57+4xCA53) before/after: > > psci_checker: PSCI checker started using 8 CPUs > > 8 CPU cores detected. > > psci_checker: Starting hotplug tests > psci_checker: Trying to turn off and on again all CPUs > CPU1: shutdown > psci: CPU1 killed. > CPU2: shutdown > psci: CPU2 killed. > -NOHZ: local_softirq_pending 55 > CPU3: shutdown > psci: CPU3 killed. > -NOHZ: local_softirq_pending 51 > CPU4: shutdown > psci: CPU4 killed. > NOHZ: local_softirq_pending 55 > CPU5: shutdown > psci: CPU5 killed. > NOHZ: local_softirq_pending 55 > CPU6: shutdown > psci: CPU6 killed. > NOHZ: local_softirq_pending 55 > CPU7: shutdown > psci: CPU7 killed. > Detected PIPT I-cache on CPU1 > CPU1: Booted secondary processor 0x0000000001 [0x411fd073] > Detected PIPT I-cache on CPU2 > CPU2: Booted secondary processor 0x0000000002 [0x411fd073] > Detected PIPT I-cache on CPU3 > CPU3: Booted secondary processor 0x0000000003 [0x411fd073] > Detected VIPT I-cache on CPU4 > CPU4: Booted secondary processor 0x0000000100 [0x410fd034] > cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz > cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz > Detected VIPT I-cache on CPU5 > CPU5: Booted secondary processor 0x0000000101 [0x410fd034] > Detected VIPT I-cache on CPU6 > CPU6: Booted secondary processor 0x0000000102 [0x410fd034] > Detected VIPT I-cache on CPU7 > CPU7: Booted secondary processor 0x0000000103 [0x410fd034] > > All but CPU0 tested, as expected. > OK, does the firmware on this system not allow CPU0 to be hotplugged out ? > psci_checker: Trying to turn off and on again group 0 (CPUs 0-3) > > 4 big CPU cores detected. > > CPU1: shutdown > psci: CPU1 killed. > -NOHZ: local_softirq_pending 55 > +NOHZ: local_softirq_pending 51 > CPU2: shutdown > psci: CPU2 killed. > NOHZ: local_softirq_pending 51 > CPU3: shutdown > psci: CPU3 killed. > Detected PIPT I-cache on CPU1 > CPU1: Booted secondary processor 0x0000000001 [0x411fd073] > Detected PIPT I-cache on CPU2 > CPU2: Booted secondary processor 0x0000000002 [0x411fd073] > Detected PIPT I-cache on CPU3 > CPU3: Booted secondary processor 0x0000000003 [0x411fd073] > > All but CPU0 tested, as expected. > > psci_checker: Trying to turn off and on again group 1 (CPUs 4-7) > > 4 LITTLE CPU cores detected. > > CPU4: shutdown > psci: CPU4 killed. > NOHZ: local_softirq_pending 55 > -CPU5: shutdown > -psci: CPU5 killed. > -NOHZ: local_softirq_pending 55 > -CPU6: shutdown > -psci: CPU6 killed. > -NOHZ: local_softirq_pending 55 > -CPU7: shutdown > -psci: CPU7 killed. > Detected VIPT I-cache on CPU4 > CPU4: Booted secondary processor 0x0000000100 [0x410fd034] > -cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz > -cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz > -Detected VIPT I-cache on CPU5 > -CPU5: Booted secondary processor 0x0000000101 [0x410fd034] > -Detected VIPT I-cache on CPU6 > -CPU6: Booted secondary processor 0x0000000102 [0x410fd034] > -Detected VIPT I-cache on CPU7 > -CPU7: Booted secondary processor 0x0000000103 [0x410fd034] > > Woops, CPU5-7 are not tested. > I don't understand what you mean by that. From the logs, it looks fine. What do you mean by "CPU5-7 are not tested" ? > psci_checker: Hotplug tests passed OK > > > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -279,6 +279,9 @@ int __cpu_disable(void) > > if (ret) > > return ret; > > > > + remove_cpu_topology(cpu); > > + numa_remove_cpu(cpu); > > + > > /* > > * Take this CPU offline. Once we clear this, we can't return, > > * and we must not schedule until we're ready to give up the cpu. > > A simple revert is not sufficient, as that causes > > watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [cpuhp/2:21] > > Do you have an idea how to fix this? Sorry, but I am finding it hard to understand the issue from the log. Also, it would be good to know your config. Is it defconfig - NUMA as before ? -- Regards, Sudeep