Re: blktests failures with v5.19-rc1

Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> · Tue, 14 Jun 2022 01:09:07 +0000

(CC+: linux-pci)

On Jun 11, 2022 / 16:34, Yi Zhang wrote:
> On Fri, Jun 10, 2022 at 10:49 PM Keith Busch <kbusch@xxxxxxxxxx> wrote:
> >
> > On Fri, Jun 10, 2022 at 12:25:17PM +0000, Shinichiro Kawasaki wrote:
> > > On Jun 10, 2022 / 09:32, Chaitanya Kulkarni wrote:
> > > > >> #6: nvme/032: Failed at the first run after system reboot.
> > > > >>                 Used QEMU NVME device as TEST_DEV.
> > > > >>
> > > >
> > > > ofcourse we need to fix this issue but can you also
> > > > try it with the real H/W ?
> > >
> > > Hi Chaitanya, thank you for looking into the failures. I have just run the test
> > > case nvme/032 with real NVME device and observed the exactly same symptom as
> > > QEMU NVME device.
> >
> > QEMU is perfectly fine for this test. There's no need to bring in "real"
> > hardware for this.
> >
> > And I am not even sure this is real. I don't know yet why this is showing up
> > only now, but this should fix it:
> 
> Hi Keith
> 
> Confirmed the WARNING issue was fixed with the change, here is the log:

Thanks. I also confirmed that Keith's change to add __ATTR_IGNORE_LOCKDEP to
dev_attr_dev_rescan avoids the fix, on v5.19-rc2.

I took a closer look into this issue and found The deadlock WARN can be
recreated with following two commands:

# echo 1 > /sys/bus/pci/devices/0000\:00\:09.0/rescan
# echo 1 > /sys/bus/pci/devices/0000\:00\:09.0/remove

And it can be recreated with PCI devices other than NVME controller, such as
SCSI controller or VGA controller. Then this is not a storage sub-system issue.

I checked function call stacks of the two commands above. As shown below, it
looks like ABBA deadlock possibility is detected and warned.

echo 1 > /sys/bus/pci/devices/*/rescan
  kernfs_fop_write_iter
    kernfs_get_active
      atomic_inc_unless_nagative
      rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP) :lock L1 for "rescan" file
    dev_rescan_store
      pci_lock_rescan_remove
        mutex_lock(&pci_rescan_remove_lock)           :lock L2

echo 1 > /sys/bus/pci/devices/*/remove
  kernfs_fop_write_iter
    remove_store
      pci_stop_and_remove_bus_device_locked
        pci_lock_rescan_remove
          mutex_lock(&pci_rescan_remove_lock)         :lock L2
        pci_stop_and_remove_bus_device
	  pci_remove_bus_device
	    device_del
	      device_remove_attrs
		sysfs_remove_attrs
		  sysfs_remove_groups
		    sysfs_remove_group
		      remove_files    .... iterates for pci device sysfs files including "rescan" file?
			kernfs_remove_by_name_ns
			  __kernfs_remove
			    kernfs_drain
			      rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP): lock L1

It looks for me that the deadlock possibility exists for real, even though the
race between rescan operation and remove operation is really rare use case.

I would like to hear comments on the guess above. I take the liberty to CC this
to linux-pci list. Is this an issue to fix?

-- 
Shin'ichiro Kawasaki