Russell! On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote: > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote: >> > As for the cpu locking, I couldn't find anything in arch_register_cpu() >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock >> > being taken - so I've no idea why the "make_present" case takes these >> > locks. >> >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early >> boot must hold the appropriate write locks. Otherwise it would be >> possible to online a CPU which just got marked present, but the >> registration has not completed yet. > > Yes. As far as I've been able to determine, arch_register_cpu() > doesn't manipulate any of the CPU masks. All it seems to be doing > is initialising the struct cpu, registering the embedded struct > device, and setting up the sysfs links to its NUMA node. > > There is nothing obvious in there which manipulates any CPU masks, and > this is rather my fundamental point when I said "I couldn't find > anything in arch_register_cpu() that depends on ...". > > If there is something, then comments in the code would be a useful aid > because it's highly non-obvious where such a manipulation is located, > and hence why the locks are necessary. acpi_processor_hotadd_init() ... acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id); That ends up in fiddling with cpu_present_mask. I grant you that arch_register_cpu() is not, but it might rely on the external locking too. I could not be bothered to figure that out. >> Define "real hotplug" :) >> >> Real physical hotplug does not really exist. That's at least true for >> x86, where the physical hotplug support was chased for a while, but >> never ended up in production. >> >> Though virtualization happily jumped on it to hot add/remove CPUs >> to/from a guest. >> >> There are limitations to this and we learned it the hard way on X86. At >> the end we came up with the following restrictions: >> >> 1) All possible CPUs have to be advertised at boot time via firmware >> (ACPI/DT/whatever) independent of them being present at boot time >> or not. >> >> That guarantees proper sizing and ensures that associations >> between hardware entities and software representations and the >> resulting topology are stable for the lifetime of a system. >> >> It is really required to know the full topology of the system at >> boot time especially with hybrid CPUs where some of the cores >> have hyperthreading and the others do not. >> >> >> 2) Hot add can only mark an already registered (possible) CPU >> present. Adding non-registered CPUs after boot is not possible. >> >> The CPU must have been registered in #1 already to ensure that >> the system topology does not suddenly change in an incompatible >> way at run-time. >> >> The same restriction would apply to real physical hotplug. I don't think >> that's any different for ARM64 or any other architecture. > > This makes me wonder whether the Arm64 has been barking up the wrong > tree then, and whether the whole "present" vs "enabled" thing comes > from a misunderstanding as far as a CPU goes. > > However, there is a big difference between the two. On x86, a processor > is just a processor. On Arm64, a "processor" is a slice of the system > (includes the interrupt controller, PMUs etc) and we must enumerate > those even when the processor itself is not enabled. This is the whole > reason there's a difference between "present" and "enabled" and why > there's a difference between x86 cpu hotplug and arm64 cpu hotplug. > The processor never actually goes away in arm64, it's just prevented > from being used. It's the same on X86 at least in the physical world. Thanks, tglx