On Mon, Nov 11, 2024 at 09:00:18AM +0100, Lukas Wunner wrote: > On Mon, Nov 11, 2024 at 08:38:03AM +0100, Lukas Wunner wrote: > > Thinking about this some more: > > > > The problem is pci_lock_rescan_remove() is a single global lock. > > > > What if we introduce a lock at each bridge or for each pci_bus. > > Before a portion of the hierarchy is removed, all locks in that > > sub-hierarchy are acquired bottom-up. > > > > I think that should avoid the deadlock. Thoughts? > > I note that you attempted something similar back in July: > > https://lore.kernel.org/all/20240722151936.1452299-9-kbusch@xxxxxxxx/ > > However I'd suggest to solve this differently: > > Keep the pci_lock_rescan_remove() everywhere, don't add pci_lock_bus() > adjacent to it. > > Instead, amend pci_lock_rescan_remove() to walk the sub-hierarchy > bottom-up and acquire all the bus locks. Obviously you'll have to amend > pci_lock_rescan_remove() to accept a pci_dev which is the bridge atop > the sub-hierarchy. (Or alternatively, the top-most pci_bus in the > sub-hierarchy.) I don't think we can walk the bus bottom-up without hitting the same deadlock I'm trying to fix.