On Thu, Jul 23, 2020 at 11:51:52AM +0200, Lukas Wunner wrote: > On Thu, Jul 23, 2020 at 05:13:06PM +0800, kernel test robot wrote: > > FYI, we noticed the following commit (built with gcc-9): > [...] > > commit: 3233e41d3e8ebcd44e92da47ffed97fd49b84278 ("[PATCH] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock") > [...] > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > [ 0.971752] WARNING: CPU: 0 PID: 1 at drivers/pci/pci.c:4905 pci_reset_hotplug_slot+0x70/0x80 > > Thank you, trusty robot. > > I botched the call to lockdep_assert_held_write(), it should have been > conditional on "if (probe)". > > Happy to respin the patch, but I'd like to hear opinions on the locking > issues surrounding xen and octeon (and the patch in general). I wish liquidio/octeon weren't a special case. Why should that driver reset the device when unbinding when no other drivers do? Looks like this was added by 70535350e26f ("liquidio: with embedded f/w, don't reload f/w, issue pf flr at exit"). Maybe Rick will chime in. > In particular, would a solution be entertained wherein the pci_dev is > reset by the PCI core after driver unbinding, contingent on a flag which > is set by a PCI driver to indicate that the pci_dev is returned to the > core in an unclean state? How would we do this? The PCI core isn't called after unbinding, is it? So I guess we'd have to have a queue and a worker thread to process it? Device removal also has nasty locking issues, and a queue might help solve those, too. Might also help in the problematic case of 40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges"), which we had to revert. > Also, why does xen require a device reset on bind? > > Thanks! > > Lukas