Hi Joel, Will, On Wed, Jan 18, 2023 at 10:01:07PM +0000, Joel Fernandes wrote: > On Wed, Jan 18, 2023 at 4:51 PM Will Deacon <will@xxxxxxxxxx> wrote: > > On Tue, Jan 17, 2023 at 08:00:58PM -0800, Paul E. McKenney wrote: > > > On Wed, Jan 18, 2023 at 02:17:06AM +0000, Joel Fernandes wrote: > > > > > > I would be happier to forgive failure to offline housekeeping CPUs than > > > blanket forgiveness of CPU 0. Especially given that I recently got > > > burned by a non-zero boot cpu. ;-) > > > > > > But wouldn't it be even better for cpu_is_hotpluggable() to know the > > > NO_HZ_FULL rules of the road? > > > > > > > Adding Frederic to CC as well as we are talking about > > > > housekeeping/isolation stuff. > > > > > > But as you say, perhaps Frederic has a better idea. > > > > > > > > And topology_init() sets this based on platform_can_hotplug_cpu(cpu). > > > > > And this function sets CPU 0 as !cpu_is_hotpluggable() unless the > > > > > architecture specifies a .cpu_can_disable() function. > > > > > > > > Ah, that is 32-bit ARM code only. This issue is on 64-bit ARM (arch/arm64/). > > > > > > Apologies! I will look more carefully at the pathnames next time! > > > > > > But maybe arm64 needs something similar? > > > > Just chiming quickly from the arm64 side here, but there's nothing in the > > architecture that precludes offlining CPU 0 and it certainly works on some > > platforms, so I'd be hesitant to rule it out entirely for testing. > > > > One reason why hotplug can fail in practice is if a trusted OS (i.e. code > > running on the secure side of the fence outside of Linux's view of the > > world) is resident on a core and rejects firmware requests to power it > > off. The PSCI code (drivers/firmware/psci/) should detect this and return > > -EPERM, although earlier in this thread there was mention of -EBUSY so it > > sounds like something else... > > Thank you for the heads up on that. To give you context, I am > currently testing rcutorture on stable kernels 5.10, 5.15, 6.1 on my > ARM64 QC7180 board. I certainly don't want to hit the -EPERM in the > future on this or other ARM64 hardware. It would be great if > cpu_psci_cpu_can_disable() in arm64 can return false if hotplugging > causes -EPERM indefinitely. Then we do not need to make any changes. That should already be the case, and I think we're good on that front. A trusted OS (which blocks offlining a CPU) will always be resident on a specific CPU (since we don't have any code to migrate trusted OSs across CPUs as this is not standardised, and we don't have code to instantiate a trusted OS from Linux). Where a non-migrateable trusted OS is present, it's going to have been instantiated prior to booting Linux, and therefore will be on CPU0 (or a CPU that Linux is not using at all). Given the above, the return value of cpu_psci_cpu_can_disable() should not change for a given CPU, and it should only be able to return false on CPU0. Most systems don't have a trusted OS blocking PSCI CPU_OFF, and CPU0 can be offlined. Thanks, Mark.