Dan Williams wrote: > While the experiment did reveal that there are additional places that > are missing the lock during secondary bus reset, one of the places that > needs to take cfg_access_lock (pci_bus_lock()) is not prepared for > lockdep annotation. > > Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is > currently dependent on the fact that the device_lock() is marked > lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that > annotation, pci_bus_lock() would need to use something like a new > pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in > the topology, and a hope that the depth of a PCI tree never exceeds the > max value for a lockdep subclass. > > The alternative to ripping out the lockdep coverage would be to deploy a > dynamic lock key for every PCI device. Unfortunately, there is evidence > that increasing the number of keys that lockdep needs to track to be > per-PCI-device is prohibitively expensive for something like the > cfg_access_lock. > > The main motivation for adding the annotation in the first place was to > catch unlocked secondary bus resets, not necessarily catch lock ordering > problems between cfg_access_lock and other locks. > > Replace the lockdep tracking with a pci_warn_once() for that primary > concern. > > Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()") > Reported-by: Imre Deak <imre.deak@xxxxxxxxx> > Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@xxxxxxxxxxxxxxxxxxxxxxxx > Cc: Jani Saarinen <jani.saarinen@xxxxxxxxx> > Cc: Dave Jiang <dave.jiang@xxxxxxxxx> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> Bjorn, this against mainline, not your tree where I see you already have "PCI: Make cfg_access_lock lockdep key a singleton" queued up. The "overkill" justification for making it singleton is valid, but then means that it has all the same problems as the device lock that needs to be marked lockdep_set_novalidate_class(). Let me know if you want this rebased on your for-linus branch. Note that the pci_warn_once() will trigger on all pci_bus_reset() users unless / until pci_bus_lock() additionally locks the bridge itself ala: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch Apologies for the thrash, this has been a useful exercise for finding some of these gaps, but ultimately not possible to carry forward without more invasive changes.