Hi Bjorn, On 6/21/21 1:49 PM, Hans de Goede wrote: <snip> >> But maybe we aren't smart enough when trying to allocate space. >> Places like __pci_bus_size_bridges() and __pci_assign_resource() are >> full of assumptions about what PCI BARs can go where, depending on >> 64bit-ness, prefetchability, etc. >> >> Maybe instrumenting those allocation paths would give some insight. >> Possibly we should go ahead and merge some permanent pci_dbg() stuff >> there too. > > I agree that this seems to be the most likely issue. I've build > a Fedora kernel pkg with some extra debugging added for the reporter > to be test. Since I'm reliant on a debug cycle where I provide > a kernel and then the reporter comes back with a new debug, > and then rince-repeat it might be a while before I get back to you > on this. Hopefully when I do get back I will have figured out > what the problem is. > > (and in case it was not clear yet the RFC patch which started this > discussion clearly is bogus and can be considered dropped). Ok, I believe I've figured out what the root cause of things failing is here. I've added a couple of pr_info-s to kernel/resource.c __find_resource() like this: pr_info("__find_resource() trying 0x%llx - 0x%llx\n", tmp.start, tmp.end); if (tmp.end < tmp.start) goto next; resource_clip(&tmp, constraint->min, constraint->max); arch_remove_reservations(&tmp); pr_info("__find_resource() after clip + arch-reserv 0x%llx - 0x%llx\n", tmp.start, tmp.end); And the following (relevant) messages get logged by these 2 pr_info-s when trying to allocate a resource for the unassigned IOMEM BAR of the I2C controller on the troublesome laptops: [ 1.262423] resource: allocate_resource root [mem 0x65400000-0xbfffffff window] new [mem size 0x00001000 64bit] size 0x1000 min 0x65400000 max 0xbfffffff align 0x1000 [ 1.262426] resource: __find_resource() trying 0x65400000 - 0x6fffffff [ 1.262428] resource: __find_resource() after clip + arch-reserv 0xd0000000 - 0x6fffffff ... [ 1.262543] resource: __find_resource() trying 0x8122d000 - 0x8122efff [ 1.262544] resource: __find_resource() after clip + arch-reserv 0xd0000000 - 0x8122efff etc. Notice that the regions start at 0xd0000000 rather then their original start address after these 2 calls: resource_clip(&tmp, constraint->min, constraint->max); arch_remove_reservations(&tmp); I believe that this is caused by the arch_remove_reservations call, which applies e820 reservations and the e820 table has the following: [ 0.000000] BIOS-e820: [mem 0x000000004bc50000-0x00000000cfffffff] reserved So the e820 table is basically reserving the entire main PCI bus iomem window, which runs from 0x65400000-0xbfffffff : [ 0.609979] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] [ 0.609983] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] [ 0.609986] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] [ 0.609988] pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window] [ 0.609991] pci_bus 0000:00: root bus resource [bus 00-fe] One way of fixing this would be to check CRS provided PCI root bridge windows vs e820 reservations and remove (carve out) the PCI root bridge iomem windows from any e820 reservations. Probably with some warning about the firmware being buggy. I'm not sure if this won't cause issues else where though. So before I spend time on trying to code something like this, any other ideas / suggestions how to fix this ? Regards, Hans