On Tue, 2024-10-08 at 15:58 +0200, Lukas Wunner wrote: > On Mon, Oct 07, 2024 at 04:49:19PM +0000, Wassenberg, Dennis wrote: > > > The unplug event happens at the top of the hierarchy (below the Root Port). > > > So pci_bus_add_devices() binds the Root Port, its driver starts stopping > > > and removing the hierarchy below, all the while pci_bus_add_devices() > > > continues binding drivers to the child devices. > > > > > > Could you try this patch (in addition to the one below and to the one > > > I sent yesterday): > > > > > > https://lore.kernel.org/all/20241003084342.27501-1-brgl@xxxxxxxx/ > > > > > > It should prevent pci_bus_add_devices() from racing with pciehp stopping > > > and removing devices. > > > > I checked the combination of all 3 patches as well. In the end it behaves > > the same like if I apply the first patch only (the one you sent the day > > before). > > Thanks a lot for testing and the detailed feedback. > > Would it be possible for you to try the above-linked patch alone > (on top of a recent stock kernel), i.e. without the refcounting > fix that you say was sufficient to avoid the UAF? > > And I'd also appreciate if you could try the match_driver approach ... > > https://lore.kernel.org/all/Zv-dIHDXNNYomG2Y@xxxxxxxxx/ > > ... alone, i.e. without any other patches. > > It's interesting that the refcounting fix was sufficient to avoid > the UAF but I can't get over the fact that the pcieport driver is > unbound from pci_remove_bus_device(), when it should no longer be > bound in the first place. My impression is that teardown of the > hierarchy by pciehp races with driver binding after the initial > root bus scan, so we probably should try to avoid that. I'd like > to confirm (or disprove) that hunch. > > The refcounting fix could be applied as a safety net but normally > shouldn't be necessary if driver unbinding happens in pci_stop_dev() > and the device remains unbound afterwards. The match_driver patch > should achieve that. And the other patch by Bartosz (linked above) > should achieve the same by serializing driver binding after bus > enumeration with driver unbinding by pciehp. > > Finally, I'd appreciate if you could send me dmesg output with the > refcounting fix applied. As said before, the MTL Thunderbolt controller > claims that the link and slot presence bits are cleared, so it > de-enumerates everything attached via Thunderbolt. I'm wondering > if it then re-enumerates the Thunderbolt-attached devices so they're > actually usable? I will definitely do that but unfortunately it will take some time. I will be OOO for the next 2 weeks starting from today. > > I'm hoping Mika can clarify with Intel Thunderbolt CoE whether this > is a hardware issue in MTL that can e.g. be fixed through a firmware > or BIOS update. > > Thanks! > > Lukas