Hi, On 1/14/21 10:55 PM, Pierre-Louis Bossart wrote: > Hi, > My primary test device for SOF on Cherrytrail no longer boots with v5.11-rc3 and the sof-dev branch, nothing happens after the 'loading initial ramdisk'. It's a 'Zotac' headless device derived from the Cherrytrail FFD design, so likely there are other devices hit by this problem. > > A long bisect points to the commit 71da201f38dfb ('ACPI: scan: Defer enumeration of devices with _DEP lists'). > > Reverting the two commits below solves the boot issue. > > I have absolutely no idea what these two patches do, but they sure have a large impact. Please let me know what sort of information or tests might help root-cause this problem. Heh, I was just about to answer your other (off-list) email about your CHT test device booting with a suggestion that you should try reverting that exact commit as it is the only commit that I'm aware of which went into 5.11 which might cause this... So I just boot 5.11-rc3 on a Acer Aspire Switch 10E SW3-016 (x5-Z8300 CHT based) myself and that booted fine.` Next I tried a MINIX NEO Z83-4 (x5-Z8300) which is a Mini PC and as such probably the closest to the Zotac box which you are using which I have at hand to test on, and I can somewhat reproduce it there. It seems that the new code somehow causes us to hit a race somewhere, so the NEO Z83-4 will boot most of the times but not always, it get past the loading initrd phase for me and then it threw the following error and after that the boot hung (waiting for the rootfs to show up) platform device 80860F14: Resources present before probing As I already told Rafael in a previous email, I did see something similar when my personal tree was still 5.10 based, with the ACPI scan rework patches cherry-picked for testing. In that case I got a backtrace (followed by a hang) during boot about a kernel NULL pointer deref triggered by sysfs_seq_file_read or some such. But this problem went away with 5.11-rc1, so I stopped looking into it. I do have a tag of my broken 5.10 + cherry-picks tree, so I should be able to reproduce that issue. So I see 2 possible theories here: 1. We have 2 probes of the same device racing somehow 2. The struct device memory is getting corrupted somehow. Pierre-Louis, can you see if the following hack helps? : --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -1939,7 +1939,6 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep, /* Bail out if the number of recorded dependencies is not 0. */ if (count > 0) { acpi_bus_scan_second_pass = true; - return AE_CTRL_DEPTH; } } @@ -1948,8 +1947,7 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep, return AE_CTRL_DEPTH; acpi_scan_init_hotplug(device); - if (!check_dep) - acpi_scan_dep_init(device); + acpi_scan_dep_init(device); out: if (!*adev_p) And can you collect an acpidump from the device and either send it to me and Rafael offlist, or upload it somewhere and send us a link ? Regards, Hans