On Tue, Jun 14, 2022 at 04:00:45AM +0000, Shinichiro Kawasaki wrote: > On Jun 14, 2022 / 02:38, Chaitanya Kulkarni wrote: > > Shinichiro, > > > > On 6/13/22 19:23, Keith Busch wrote: > > > On Tue, Jun 14, 2022 at 01:09:07AM +0000, Shinichiro Kawasaki wrote: > > >> (CC+: linux-pci) > > >> On Jun 11, 2022 / 16:34, Yi Zhang wrote: > > >>> On Fri, Jun 10, 2022 at 10:49 PM Keith Busch <kbusch@xxxxxxxxxx> wrote: > > >>>> > > >>>> And I am not even sure this is real. I don't know yet why > > >>>> this is showing up only now, but this should fix it: > > >>> > > >>> Hi Keith > > >>> > > >>> Confirmed the WARNING issue was fixed with the change, here is > > >>> the log: > > >> > > >> Thanks. I also confirmed that Keith's change to add > > >> __ATTR_IGNORE_LOCKDEP to dev_attr_dev_rescan avoids the fix, on > > >> v5.19-rc2. > > >> > > >> I took a closer look into this issue and found The deadlock > > >> WARN can be recreated with following two commands: > > >> > > >> # echo 1 > /sys/bus/pci/devices/0000\:00\:09.0/rescan > > >> # echo 1 > /sys/bus/pci/devices/0000\:00\:09.0/remove > > >> > > >> And it can be recreated with PCI devices other than NVME > > >> controller, such as SCSI controller or VGA controller. Then > > >> this is not a storage sub-system issue. > > >> > > >> I checked function call stacks of the two commands above. As > > >> shown below, it looks like ABBA deadlock possibility is > > >> detected and warned. > > > > > > Yeah, I was mistaken on this report, so my proposal to suppress > > > the warning is definitely not right. If I run both 'echo' > > > commands in parallel, I see it deadlock frequently. I'm not > > > familiar enough with this code to any good ideas on how to fix, > > > but I agree this is a generic pci issue. > > > > I think it is worth adding a testcase to blktests to make sure > > these future releases will test this. > > Yeah, this WARN is confusing for us then it would be valuable to > test by blktests not to repeat it. One point I wonder is: which test > group the test case will it fall in? The nvme group could be the > group to add, probably. > > Another point I wonder is other kernel test suite than blktests. > Don't we have more appropriate test suite to check PCI device > rescan/remove race ? Such a test sounds more like a PCI bus > sub-system test than block/storage test. I'm not aware of such a test, but it would be nice to have one. Can you share your qemu config so I can reproduce this locally? Thanks for finding and reporting this! Bjorn