On Fri, 2023-02-24 at 05:19 +0100, Lukas Wunner wrote: > On Thu, Feb 23, 2023 at 01:53:45PM -0600, Bjorn Helgaas wrote: > > Hmm. Good question. Off the top of my head, I can't explain the > > difference between pci_rescan_remove_lock and pci_bus_sem, so I'm > > confused, too. I added Lukas in case he has a ready explanation. > > pci_bus_sem is a global lock which protects the "devices" list of all > pci_bus structs. > > We do have a bunch of places left where the "devices" list is accessed > without holding pci_bus_sem, though I've tried to slowly eliminate > them. > > pci_rescan_remove_lock is a global "big kernel lock" which serializes > any device addition and removal. > > pci_rescan_remove_lock is known to be far too course-grained and thus > deadlock-prone, particularly if hotplug ports are nested (as is the > case with Thunderbolt). It needs to be split up into several smaller > locks which protect e.g. allocation of resources of a bus (bus numbers > or MMIO / IO space) and whatever else needs to be protected. It's just > that nobody has gotten around to identify what exactly needs to be > protected, adding the new locks and removing pci_rescan_remove_lock. > > Thanks, > > Lukas Thanks for the insights. So from that description I think it might make sense to do this fix patch with the pci_rescan_remove_lock so it can be backported. Then we can take the opportunity to add a lock specific to the allocation/freeing of resources which would then replace at least this new directly and clearly resource related use of pci_rescan_remove_lock and potentially others we find. What do you think? Thanks, Niklas