Re: [PATCH RFC v3 17/21] ACPI: add support to register CPUs based on the _STA enabled bit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 23 Jan 2024 13:10:44 +0000
"Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> wrote:

> On Tue, Jan 23, 2024 at 10:26:03AM +0000, Jonathan Cameron wrote:
> > On Tue, 2 Jan 2024 14:53:20 +0000
> > Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
> >   
> > > On Mon, 18 Dec 2023 13:03:32 +0000
> > > "Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> wrote:
> > >   
> > > > On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:    
> > > > > From: James Morse <james.morse@xxxxxxx>
> > > > > 
> > > > > acpi_processor_get_info() registers all present CPUs. Registering a
> > > > > CPU is what creates the sysfs entries and triggers the udev
> > > > > notifications.
> > > > > 
> > > > > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > > > > enabled bit to indicate whether the CPU can be brought online, as
> > > > > the existing ACPI tables require all hardware to be described and
> > > > > present.
> > > > > 
> > > > > If firmware describes a CPU as present, but disabled, skip the
> > > > > registration. Such CPUs are present, but can't be brought online for
> > > > > whatever reason. (e.g. firmware/hypervisor policy).
> > > > > 
> > > > > Once firmware sets the enabled bit, the CPU can be registered and
> > > > > brought online by user-space. Online CPUs, or CPUs that are missing
> > > > > an _STA method must always be registered.      
> > > > 
> > > > ...
> > > >     
> > > > > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> > > > >  		acpi_processor_make_not_present(device);
> > > > >  		return;
> > > > >  	}
> > > > > +
> > > > > +	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > > > > +		arch_unregister_cpu(pr->id);      
> > > > 
> > > > This change isn't described in the commit log, but seems to be the cause
> > > > of the build error identified by the kernel build bot that is fixed
> > > > later in this series. I'm wondering whether this should be in a
> > > > different patch, maybe "ACPI: Check _STA present bit before making CPUs
> > > > not present" ?    
> > > 
> > > Would seem a bit odd to call arch_unregister_cpu() way before the code
> > > is added to call the matching arch_registers_cpu()
> > > 
> > > Mind you this eject doesn't just apply to those CPUs that are registered
> > > later I think, but instead to all.  So we run into the spec hole that
> > > there is no way to identify initially 'enabled' CPUs that might be disabled
> > > later.
> > >   
> > > > 
> > > > Or maybe my brain isn't working properly (due to being Covid positive.)
> > > > Any thoughts, Jonathan?    
> > > 
> > > I'll go with a resounding 'not sure' on where this change belongs.
> > > I blame my non existent start of the year hangover.
> > > Hope you have recovered!  
> > 
> > Looking again, I think you were right, move it to that earlier patch.  
> 
> I'm having second thoughts - because this patch introduces the
> arch_register_cpu() into the acpi_processor_add() path (via
> acpi_processor_get_info() and acpi_processor_make_enabled(), so isn't
> it also correct to add arch_unregister_cpu() to the detach/post_eject
> path as well? If we add one without the other, doesn't stuff become
> a bit asymetric?
> 
> Looking more deeply at these changes, I'm finding it isn't easy to
> keep track of everything that's going on here.

I can sympathize.

> 
> We had attach()/detach() callbacks that were nice and symetrical.
> How we have this post_eject() callback that makes things asymetrical.
> 
> We have the attach() method that registers the CPU, but no detach
> method, instead having the post_eject() method. On the face of it,
> arch_unregister_cpu() doesn't look symetric unless one goes digging
> more in the code - by that, I mean arch_register_cpu() only gets
> called of present=1 _and_ enabled=1. However, arch_unregister_cpu()
> gets called buried in acpi_processor_make_not_present(), called when
> present=0, and then we have this new one to handle the case where
> enabled=0. It is not obvious that arch_unregister_cpu() is the reverse
> of what happens with arch_register_cpu() here.

One option would be to pull the arch_unregister_cpu() out so it
happens in one place in both the present = 0 and enabled = 0 cases but
I'm not sure if it's safe to reorder the contents of 
acpi_processor_not_present() as it's followed by a bunch of things.

Would looks something like

if (cpu_present(pr->id)) {
	if (!(sta & ACPI_STA_DEVICE_PRESENT)) {
		acpi_processor_make_not_present(device); /* Remove arch_cpu_unregister() */
	} else if (!(sta & ACPI_STA_DEVICE_ENABLED)) {
		/* Nothing to do in this case */
	} else {
		return; /* Firmware did something silly - probably racing */
	}
	arch_unregister_cpu(pr->id);

	return;
}

> 
> Then we have the add() method allocating pr->throttling.shared_cpu_map,
> and acpi_processor_make_not_present() freeing it. From what I read in
> ACPI v6.5, enabled is not allowed to be set without present. So, if
> _STA reports that a CPU that had present=1 enabled=1, but then is
> later reported to be enabled=0 (which we handle by calling only
> arch_unregister_cpu()) then what happens when _STA changes to
> enabled=1 later? Does add() get called? 

yes it does (I poked it to see) which indeed isn't good (unless I've
broken my setup in some obscure way).

Seems we need a few more things than arch_unregister_cpu() pulled out
in the above code.


> If it does, this would cause
> a new acpi_processor structure to be allocated and the old one to be
> leaked... I hope I'm wrong about add() being called - but if it isn't,
> how does enabled going from 0->1 get handled... and if we are handling
> its 1->0 transition separately from present, then surely we should be
> handling that.
> 
> Maybe I'm just getting confused, but I've spent much of this morning
> trying to unravel all this... and I'm of the opinion that this isn't a
> sign of a good approach.

It's all annoyingly messy at the root of things, but indeed you've found
some issues in current implementation.  Feels like just ripping out
a bunch of stuff from acpi_processor_make_not_present() and calling it
for both paths will probably work, but I've not tested that yet.

Jonathan
> 





[Index of Archives]     [Linux Kernel]     [Sparc Linux]     [DCCP]     [Linux ARM]     [Yosemite News]     [Linux SCSI]     [Linux x86_64]     [Linux for Ham Radio]

  Powered by Linux