On Thu, 19 Sep 2024, Wassenberg, Dennis wrote: > Hi together, > > we are facing into issues which seems to be PCI related and asking for your estimations. > > Background: > We want to boot up an Intel MeteorLake based system (e.g. Lenovo ThinkPad X13 Gen5) with the Lenovo Thunderbolt 4 > universal dock attached during boot. On some devices it is nearly 100% reproducible that the boot will fail. Other > systems will never show this issue (e.g. older devices based on RaptorLake or AlderLake platform). > > We did some debugging on this and came to the conclusion that there is a use-after-free in pci_slot_release. > The Thunderbolt 4 Dock will expose a PCI hierarchy at first and shortly after that, due to the device is inaccessible, > it will release the additional buses/ports. This seems to end up in a race where pci_slot_release accesses &slot->bus > which as already freed: > > 0000:00 [root bus] > -> 0000:00:07.0 [bridge to 20-49] > -> 0000:20:00.0 [bridge to 21-49] > -> 0000:21:00.0 [bridge to 22] > 0000:21:01.0 [bridge to 23-2e] > 0000:21:02.0 [bridge to 2f-3a] > 0000:21:03.0 [bridge to 3b-48] > 0000:21:04.0 [bridge to 49] > 0000:00:07.2 [bridge to 50-79] > > > We are currently running on kernel 6.8.12. Because this kernel is out of support I tried it on 6.11. This kernel shows > exactly the same issue. I attached two log files: > dmesg-ramoops-0: Based on kernel 6.11 with added kernel command line option "slab_debug" in order to force a kernel Oops > while accessing freed memory. > dmesg-ramoops-0-pci_dbg: This it like dmesg-ramoops-0 with additional kernel command line option '"dyndbg=file > drivers/pci/* +p" ignore_loglevel' in order to give you more insight whats happening on the pci bus. > > I would appreciate any kind of help on this. Hi, Thanks for the report. Unfortunately I don't really know how this is supposed to work (what in which order) but the patch below might help to the immediate issue you hit. I'm a bit skeptical it's the _correct_ solution and I expect there's going to be just another spot that blows next. [PATCH 1/1] PCI: Don't access freed bus in pci_slot_release() Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> --- drivers/pci/remove.c | 2 ++ drivers/pci/slot.c | 18 ++++++++++-------- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index 910387e5bdbf..532604dd722c 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -97,6 +97,8 @@ static void pci_remove_bus_device(struct pci_dev *dev) pci_remove_bus(bus); dev->subordinate = NULL; + if (dev->slot && PCI_SLOT(dev->devfn) == dev->slot->number) + dev->slot->bus = NULL; } pci_destroy_dev(dev); diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c index 0f87cade10f7..4bcc16d484dd 100644 --- a/drivers/pci/slot.c +++ b/drivers/pci/slot.c @@ -69,14 +69,16 @@ static void pci_slot_release(struct kobject *kobj) struct pci_dev *dev; struct pci_slot *slot = to_pci_slot(kobj); - dev_dbg(&slot->bus->dev, "dev %02x, released physical slot %s\n", - slot->number, pci_slot_name(slot)); - - down_read(&pci_bus_sem); - list_for_each_entry(dev, &slot->bus->devices, bus_list) - if (PCI_SLOT(dev->devfn) == slot->number) - dev->slot = NULL; - up_read(&pci_bus_sem); + if (slot->bus) { + dev_dbg(&slot->bus->dev, "dev %02x, released physical slot %s\n", + slot->number, pci_slot_name(slot)); + + down_read(&pci_bus_sem); + list_for_each_entry(dev, &slot->bus->devices, bus_list) + if (PCI_SLOT(dev->devfn) == slot->number) + dev->slot = NULL; + up_read(&pci_bus_sem); + } list_del(&slot->list); -- 2.39.2