RE: [PATCH v5 03/18] ACPI: processor: Register deferred CPUs from acpi_processor_get_info()

Salil Mehta <salil.mehta@xxxxxxxxxx> · Wed, 17 Apr 2024 15:01:01 +0000

HI Rafael,

>  From: Rafael J. Wysocki <rafael@xxxxxxxxxx>
>  Sent: Monday, April 15, 2024 5:39 PM
>  
>  On Mon, Apr 15, 2024 at 5:31 PM Salil Mehta <salil.mehta@xxxxxxxxxx>  wrote:
>  >
>  > >  From: Rafael J. Wysocki <rafael@xxxxxxxxxx>
>  > >  Sent: Monday, April 15, 2024 1:51 PM
>  > >
>  > >  On Mon, Apr 15, 2024 at 1:51 PM Salil Mehta
>  > > <salil.mehta@xxxxxxxxxx>
>  > >  wrote:
>  > >  >
>  
>  [cut]
>  
>  > >  > >  Though virtualization happily jumped on it to hot add/remove
>  > > CPUs  > > to/from  a guest.
>  > >  > >
>  > >  > >  There are limitations to this and we learned it the hard way
>  > > on  > > X86. At the  end we came up with the following restrictions:
>  > >  > >
>  > >  > >      1) All possible CPUs have to be advertised at boot time via  firmware
>  > >  > >         (ACPI/DT/whatever) independent of them being present at boot time
>  > >  > >         or not.
>  > >  > >
>  > >  > >         That guarantees proper sizing and ensures that associations
>  > >  > >         between hardware entities and software representations and the
>  > >  > >         resulting topology are stable for the lifetime of a system.
>  > >  > >
>  > >  > >         It is really required to know the full topology of the system at
>  > >  > >         boot time especially with hybrid CPUs where some of the cores
>  > >  > >         have hyperthreading and the others do not.
>  > >  > >
>  > >  > >
>  > >  > >      2) Hot add can only mark an already registered (possible) CPU
>  > >  > >         present. Adding non-registered CPUs after boot is not possible.
>  > >  > >
>  > >  > >         The CPU must have been registered in #1 already to ensure that
>  > >  > >         the system topology does not suddenly change in an incompatible
>  > >  > >         way at run-time.
>  > >  > >
>  > >  > >  The same restriction would apply to real physical hotplug. I
>  > > don't  > > think that's  any different for ARM64 or any other architecture.
>  > >  >
>  > >  >
>  > >  > There is a difference:
>  > >  >
>  > >  > 1.   ARM arch does not allows for any processor to be NOT present. Hence,  because of
>  > >  > this restriction any of its related per-cpu components must be
>  > > present  > and enumerated at the boot time as well (exposed by
>  > > firmware and  > ACPI). This means all the enumerated processors will
>  > > be marked as  > 'present' but they might exist in NOT enabled (_STA.enabled=0) state.
>  > >  >
>  > >  > There was one clear difference and please correct me if I'm wrong
>  > > > here,  for x86, the LAPIC associated with the x86 core can be brought online later even after boot?
>  > >  >
>  > >  > But for ARM Arch, processors and its corresponding per-cpu
>  > > components  > like redistributors all need to be present and
>  > > enumerated during the  > boot time. Redistributors are part of ALWAYS-ON power domain.
>  > >
>  > >  OK
>  > >
>  > >  So what exactly is the problem with this and what does
>  > >  acpi_processor_add() have to do with it?
>  >
>  >
>  > For ARM Arch, during boot time, it should add processor as if no
>  > hotplug exists. But later, in context to the (fake) hotplug trigger
>  > from the virtualizer as a result of the CPU (un)plug action  it should just
>  end up in registering the already present CPU with the Linux Driver Model.
>  
>  So let me repeat this last time: acpi_processor_add() cannot do that,
>  because (as defined today) it rejects CPUs with the "enabled" bit clear in  _STA.

I understand that now because you have placed a check recently. sorry for stretching
it a bit but I wanted to clearly understand the reason for this behavior. Is it because,

1. It does not makes sense to add a disabled but present/functional processor or
    perhaps there are repercussions to support such a behavior?

Or

2. None of the existing processors need such a behavior?

>  > >  Do you want the per-CPU structures etc. to be created from the
>  > >  acpi_processor_add() path?
>  >
>  >
>  > I referred to the components related to ARM CPU Arch like redistributors etc.
>  > which will get initialized in context to Arch specific _init code not
>  > here. This i.e. acpi_processor_add() is arch agnostic code common to all architectures.
>  >
>  > [ A digression: You do have _weak functions which can be overridden to
>  > arch specific  handling like  arch_(un)map_cpu() etc. but we can't use
>  > those to defer initialize  the CPU related components - ARM Arch
>  > constraint!]
>  
>  Not right now, but they can be added I suppose.
>  
>  >
>  > >
>  > >  This plain won't work because acpi_processor_add(), as defined in
>  > > the  mainline kernel today (and the Jonathan's patches don't change
>  > > that  AFAICS), returns an error for processor devices with the
>  > > "enabled" bit clear  in _STA (it returns an error if the "present"
>  > > bit is clear too, but that's  obvious), so it only gets to calling
>  > > arch_register_cpu() if
>  > >  *both* "present" and "enabled" _STA bits are set for the given
>  > > processor  device.
>  >
>  >
>  > If you are referring to the _STA check in the XX_hot_add_init() then
>  > in the current kernel code it only checks for the
>  > ACPI_STA_DEVICE_PRESENT flag and not the ACPI_STA_DEVICE_ENABLED flag(?).
>  
>  No, I am not.  I'm referring to this code in 6.9-rc4:
>  
>  static int acpi_processor_add(struct acpi_device *device,
>                      const struct acpi_device_id *id) {
>      struct acpi_processor *pr;
>      struct device *dev;
>      int result = 0;
>  
>      if (!acpi_device_is_enabled(device))
>          return -ENODEV;

Ahh, sorry, I missed this check as this has been added recently. Yes, now your
logic of having common legs makes more sense.

>  
>      ...
>  }
>  
>  where acpi_device_is_enabled() is defined as follows:
>  
>  bool acpi_device_is_enabled(const struct acpi_device *adev) {
>      return adev->status.present && adev->status.enabled; }

Got it. 

[digression note:]
BTW, I'm wondering why we are checking adev->status.present
as having adev->status.enabled as set and adev->status.present
as unset would mean firmware has a BUG. If we really want to
check this then we should rather flag a warning on detection
of this condition? 

Either this:
 bool acpi_device_is_enabled(const struct acpi_device *adev) {

     if (!acpi_device_is_present(adev)) {
            if (adev->status.enabled)
                       pr_debug("Device [%s] status inconsistent: Enabled but not Present\n",
                                          device->pnp.bus_id);
            return false;
     }
      return  true;
}

Ideally this inconsistency should have been checked in acpi_bus_get_status()
and above function should have been just,

file: drivers/acpi/scan.c
bool acpi_device_is_enabled(const struct acpi_device *adev) {
      return !!adev->status.enabled; }

file: drivers/acpi/bus.c
int acpi_bus_get_status(struct acpi_device *device)
{
       [...]

	status = acpi_bus_get_status_handle(device->handle, &sta);
	if (ACPI_FAILURE(status))
		return -ENODEV;

	acpi_set_device_status(device, sta);

	if (device->status.functional && !device->status.present) {
		pr_debug("Device [%s] status [%08x]: functional but not present\n",
			 device->pnp.bus_id, (u32)sta);
	}

+	if (device->status.enabled && !device->status.present) {
+		pr_debug("BUG: Device [%s] status [%08x]: enabled but not present\n",
+			 device->pnp.bus_id, (u32)sta);
+                         /* any specific handling here? */
+	}

	pr_debug("Device [%s] status [%08x]\n", device->pnp.bus_id, (u32)sta);
	return 0;
}

>  
>  > The code being reviewed has changed
>  > exactly that behavior for 2 legs i.e. make-present and make-enabled legs.
>  
>  I'm not sure what you mean here, but the code above means that
>  acpi_processor_add) does not distinguish between CPU with the "present"
>  bit clear (in which case the "enabled" bit must also be clear as per the spec)
>  and CPUs with the "present" bit set and the "enabled" bit clear.  These two
>  cases are handled in the same way.
>  
>  > I'm open to further address your point clearly.
>  
>  I hope that the above is clear enough.

Yes, clear now. I missed the new check.

>  
>  > >
>  > >  That, BTW, is why I keep saying that from the ACPI CPU enumeration
>  > > code  perspective, there is no difference between "present" and
>  "enabled".
>  >
>  >
>  > Agreed but there is still a subtle difference.  Enumeration happens
>  > once and for all the processors during the boot time. And as confirmed
>  > by yourself and Thomas as well that even in x86 arch all the
>  > processors will be discovered and their topology fixed during the boot
>  > time which is effectively the same behavior as in the ARM Arch. But
>  > ARM assumes those 'present' bits in the present masks to be set during
>  > the boot time which is not like x86(?).  Hence, 'present cpu' Bits
>  > will always be equal to 'possible cpu' Bits. This is a constraint put by the
>  ARM maintainers and looks unshakable.
>  
>  Yes, there are differences between architectures, but the ACPI code is (or
>  at least should be) architecture-agnostic (as you said somewhere above).
>  So why does this matter for the ACPI code?

It should not. There were few bits like overriding of arch_register_cpu() which
was not allowed by ARM folks in 2020 when I floated the first prototype.

>  > >  > 2.  Agreed regarding the topology. Are you suggesting that we
>  > > must  > call arch_register_cpu() during boot time for all the 'present'  CPUs?
>  > >  > Even if that's the case, we might still want to defer
>  > > registration of  > the cpu device (register_cpu() API) with the
>  > > Linux device model. Later  > is what we are doing to hide/unhide the
>  > > CPUs from the user while  STA.Enabled Bit is toggled due to CPU  (un)plug action.
>  > >
>  > >  There are two ways to approach this IMV, and both seem to be valid
>  > > in  principle.
>  > >
>  > >  One would be to treat CPUs with the "enabled" bit clear as not
>  > > present and  create all of the requisite data structures for them
>  > > when they become  available (in analogy with the "real hot-add" case).
>  >
>  >
>  > Right. This one is out-of-scope for ARM Arch as we cannot defer any
>  > Arch specific sizing and initializations after boot i.e. when
>  > processor_add() gets called again later as a trigger of CPU plug action at the Virtualizer.
>  >
>  >
>  > >
>  > >  The alternative one is to create all of the requisite data
>  > > structures for the  CPUs that you find during boot, but register CPU
>  > > devices for those having  the "enabled" _STA bit set.
>  >
>  >
>  > Correct. and we defer the registration for CPUs with online-capable
>  > Bit set in the ACPI MADT/GICC data structure. These CPUs basically
>  > form set of hot-pluggable CPUs on ARM.
>  >
>  >
>  > >
>  > >  It looks like you have chosen the second approach, which is fine
>  > > with me  (although personally, I would rather choose the first one),
>  > > but then your  arch code needs to arrange for the requisite CPU data
>  > > structures etc. to be  set up before acpi_processor_add() gets
>  > > called because, as per the above,  the latter just rejects CPUs with the  "enabled" _STA bit clear.
>  >
>  > Yes, correct. First one is not possible - at least for now and to have
>  > that it will require lot of rework especially at GIC. But there are
>  > many other arch components (like timers, PMUs, etc.) whose behavior
>  > needs to be specified somewhere in the architecture as well. All these are closely coupled with the ARM CPU architecture.
>  > (it's beyond this discussion and lets leave that to ARM folks).
>  >
>  > This patch-set has a change to deal with ACPI _STA.Enabled Bit accordingly.
>  
>  Well, I'm having a hard time with this.
>  
>  As far as CPU enumeration goes, if the "enabled" bit is clear in _STA, it does
>  not happen at all.  Both on ARM and on x86.

sure, I can see that now.

>  
>  Now tell me why there need to be two separate code paths calling
>  arch_register_cpu() in acpi_processor_add()?

As mentioned above, in the first prototype I floated in the year 2020 any attempts
to override the __weak call of arch_register_cpu() for ARM64 was discouraged. 
Though, the reasons might have changed now as some code has been moved.

Once we are allowed to override the calls then there are many more possibilities
which open up to simplify the code further.

>  
>  I see no reason whatsoever.
>  
>  Moreover, I see reasons why there needs to be only one such code path.
>  
>  Please feel free to prove me wrong.
>  
>  Thanks!