Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 1/15/21 8:01 PM, Rafael J. Wysocki wrote:
> On Friday, January 15, 2021 5:41:57 PM CET Pierre-Louis Bossart wrote:
>>
>>>>> [    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found
>>>>
>>>> Ah, that is enlightening, that is not supposed to happen, that device
>>>> has both an _ADR and an _HID method which is not allowed according
>>>> to the spec.
>>
>> it's not an uncommon issue for audio codecs, here's three examples:
>>
>>              Device (RTK1)
>>              {
>>                  Name (_ADR, Zero)  // _ADR: Address
>>                  Name (_HID, "10EC5670")  // _HID: Hardware ID
>>                  Name (_CID, "10EC5670")  // _CID: Compatible ID
>>                  Name (_DDN, "ALC5642")  // _DDN: DOS Device Name
>>
>>          Device (MAXM)
>>          {
>>              Name (_ADR, Zero)  // _ADR: Address
>>              Name (_HID, "193C9890")  // _HID: Hardware ID
>>              Name (_CID, "193C9890")  // _CID: Compatible ID
>>              Name (_DDN, "Maxim 98090 Codec  ")  // _DDN: DOS Device Name
>>
>>          Device (TISW)
>>          {
>>              Name (_ADR, Zero)  // _ADR: Address
>>              Name (_HID, "104C227E")  // _HID: Hardware ID
>>              Name (_CID, "104C227E")  // _CID: Compatible ID
>>
>> It's been that way for years...
>>
>>>> Can you try a clean 5.11 kernel (so none of the previous
>>>> debug patches) with the following change added:
>>>>
>>>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>>>> index 1f27f74cc83c..93954ac3bfcc 100644
>>>> --- a/drivers/acpi/scan.c
>>>> +++ b/drivers/acpi/scan.c
>>>> @@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
>>>>           * 2. ACPI nodes describing USB ports.
>>>>           * Still, checking for _HID catches more then just these cases ...
>>>>           */
>>>> -       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
>>>> +       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID") ||
>>>> +           acpi_has_method(handle, "_ADR"))
>>>>                  return 0;
>>>>
>>>>          status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);
>>>>
>>>>
>>>>> [    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found
>>>>
>>>> And idem. for this one.
>>>>
>>>> That might very well fix this.
>>
>> Nope, no luck with this patch. boot still stuck.
> 
> OK, thanks!
> 
> Now, there is a theory to test and some more debug work to do.
> 
> First, the kernel should not crash outright if some ACPI device objects are
> missing which evidently happens here.  There may be some problems resulting
> from that, but the crash indicates a code bug in the kernel.
> 
> Apparently, something expects the device objects to be there so badly, that it
> crashes right away when they aren't there.  One of the issues that may cause
> that to happen are mistakes around the acpi_bus_get_device() usage and I found
> two of them, so below is a patch to test.
> 
> Please apply to plain 5.11-rc3 (or -rc4 when it is out) and see if that makes
> any difference.
> 
> ---
>  drivers/acpi/scan.c         |    3 +--
>  drivers/usb/core/usb-acpi.c |    3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2120,8 +2120,7 @@ void acpi_walk_dep_device_list(acpi_hand
>  	mutex_lock(&acpi_dep_list_lock);
>  	list_for_each_entry_safe(dep, tmp, &acpi_dep_list, node) {
>  		if (dep->supplier == handle) {
> -			acpi_bus_get_device(dep->consumer, &adev);
> -			if (!adev)
> +			if (acpi_bus_get_device(dep->consumer, &adev))
>  				continue;
>  
>  			adev->dep_unmet--;

Oh, OOOhh, good catch I've been staring at these exact lines multiple times,
my "spidey sense" telling me that the problem was likely something like this:

1. The addition of the acpi_device gets deferred because of the _DEP
list, this means that there are now entries for it on the acpi_dep_list

2. Later during the first pass, or before the handle is checked again
during the second pass, acpi_walk_dep_device_list() gets called because
the _DEP is now resolved.

3. My theory was this would lead to doing driver attach twice or some
such, but that is not possible...

But instead we are following a pointer which points to whatever the memory
used by the:

        struct acpi_device *adev;

local variable on the stack points to; and it seems, at least with
my compiler / kernel .config that the stack layout is such that the
stack memory does contain a valid pointer (from some previous functions
stack frame) and then whatever that points to gets used as an acpi_device
and likely gets mangled a bit. Which explains the memory-corruption like
behavior which I've been seeing.

Specifically I think that this happening when the MMC controller
addition gets deferred to the second step:

[    0.426722] ACPI: \_SB_.PCI0.SDHB: Dependencies found
[    0.427927] ACPI: \_SB_.PCI0.SDHB.BRCM: Dependencies found
[    0.431863] ACPI: \_SB_.PCI0.SDHC: Dependencies found
[    0.433128] ACPI: \_SB_.PCI0.SHC1: Dependencies found


I'll verify that this fixes both my reproducers (5.10 + backport on one device,
5.11-rc3 on another dev) by seeing if I can now boot 10 times in a row
successfully. But I'm pretty hopeful that this will fix them.

I'm also hopeful that this will fix Pierre-Louis' case too.


> Index: linux-pm/drivers/usb/core/usb-acpi.c
> ===================================================================
> --- linux-pm.orig/drivers/usb/core/usb-acpi.c
> +++ linux-pm/drivers/usb/core/usb-acpi.c
> @@ -163,10 +163,9 @@ usb_acpi_get_companion_for_port(struct u
>  	} else {
>  		parent_handle = usb_get_hub_port_acpi_handle(udev->parent,
>  							     udev->portnum);
> -		if (!parent_handle)
> +		if (!parent_handle || acpi_bus_get_device(parent_handle, &adev))
>  			return NULL;
>  
> -		acpi_bus_get_device(parent_handle, &adev);
>  		port1 = port_dev->portnum;
>  	}
>  

I'm pretty sure that we are not actually hitting this one, but we should
definitely still fix it.

Regards,

Hans




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux