Re: [RFC][PATCH] PCI: Avoid PCI device removing/rescanning through sysfs triggers a deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+cc Bart, Tejun]

On Tue, Nov 06, 2018 at 12:25:00AM +0100, Marek Vasut wrote:
> From: Tho Vu <tho.vu.wh@xxxxxxxxxxxxxxx>
> 
> This patch fixes deadlock warning in removing/rescanning through sysfs
> when CONFIG_PROVE_LOCKING is enabled.
> 
> The issue can be reproduced by these steps:
> 1. Enable CONFIG_PROVE_LOCKING via defconfig or menuconfig
> 2. Insert Ethernet card into PCIe CH0 and start up.
>    After kernel starting up, execute the following command.
>    echo 1 > /sys/class/pci_bus/0000\:00/device/0000\:00\:00.0/remove
> 3. Rescan PCI device by this command
>    echo 1 > /sys/class/pci_bus/0000\:00/rescan
> 
> The deadlock warnings will occur.
> ============================================
> WARNING: possible recursive locking detected
> 4.14.70-ltsi-yocto-standard #27 Not tainted
> --------------------------------------------
> sh/3402 is trying to acquire lock:
>  (kn->count#78){++++}, at: kernfs_remove_by_name_ns+0x50/0xa8
> 
> but task is already holding lock:
>  (kn->count#78){++++}, at: kernfs_remove_self+0xe0/0x130
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(kn->count#78);
>   lock(kn->count#78);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 4 locks held by sh/3402:
>  #0:  (sb_writers#4){.+.+}, at: vfs_write+0x198/0x1b0
>  #1:  (&of->mutex){+.+.}, at: kernfs_fop_write+0x108/0x210
>  #2:  (kn->count#78){++++}, at: kernfs_remove_self+0xe0/0x130
>  #3:  (pci_rescan_remove_lock){+.+.}, at: pci_lock_rescan_remove+0x1c/0x28
> 
> stack backtrace:
> CPU: 3 PID: 3402 Comm: sh Not tainted 4.14.70-ltsi-yocto-standard #27
> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795
> ES3.0+ with 8GiB (4 x 2 GiB) (DT)
> Call trace:
> dump_backtrace+0x0/0x3d8
> show_stack+0x14/0x20
> dump_stack+0xbc/0xf4
> __lock_acquire+0x930/0x18a8
> lock_acquire+0x48/0x68
> __kernfs_remove+0x280/0x2f8
> kernfs_remove_by_name_ns+0x50/0xa8
> remove_files.isra.0+0x38/0x78
> sysfs_remove_group+0x4c/0xa0
> sysfs_remove_groups+0x38/0x60
> device_remove_attrs+0x54/0x78
> device_del+0x1ac/0x308
> pci_remove_bus_device+0x78/0xf8
> pci_remove_bus_device+0x34/0xf8
> pci_stop_and_remove_bus_device_locked+0x24/0x38
> remove_store+0x6c/0x78
> dev_attr_store+0x18/0x28
> sysfs_kf_write+0x4c/0x78
> kernfs_fop_write+0x138/0x210
> __vfs_write+0x18/0x118
> vfs_write+0xa4/0x1b0
> SyS_write+0x48/0xb0

Indent quoted material.  I use two spaces.

> This warning occurs due to a self-deletion attribute using in the sysfs
> PCI device directory. This kind of attribute is really tricky,
> it does not allow pci framework drop this attribute until all active
> .show() and .store() callbacks have finished unless
> sysfs_break_active_protection() is called.
> Hence this patch avoids writing into this attribute triggers a deadlock.

Wrap paragraph correctly or, if this is supposed to be two paragraphs,
insert a blank line between them.

Use "PCI" (not "pci") consistently in English text.

> Referrence commit 5b55b24cec4c ("scsi: core: Avoid that SCSI device
> removal through sysfs triggers a deadlock")
> of scsi driver

s/Referrence/Reference/

5b55b24cec4c doesn't exist.  I suppose maybe you mean 0ee223b2e1f6
("scsi: core: Avoid that SCSI device removal through sysfs triggers a
deadlock")?

Wrap paragraph correctly.

Use "SCSI" (not "scsi") consistently in English text.

> Signed-off-by: Tho Vu <tho.vu.wh@xxxxxxxxxxxxxxx>

Missing your Signed-off-by (as Geert pointed out).

> ---
>  drivers/pci/pci-sysfs.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 9ecfe13157c0..d522bd8368d9 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -470,12 +470,22 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
>  			    const char *buf, size_t count)
>  {
>  	unsigned long val;
> +	struct kernfs_node *kn;
> +
> +	kn = sysfs_break_active_protection(&dev->kobj, &attr->attr);

I'm awfully wary of interfaces with only a single user.  That suggests
that there's something very special about this path, and I doubt it's
*that* special.  I grant you that sysfs_break_active_protection() is
very new (added by 2afc9166f79b ("scsi: sysfs: Introduce
sysfs_{un,}break_active_protection()")).

> +	WARN_ON_ONCE(!kn);

I don't see the point of the WARN_ON_ONCE() (also pointed out already by
Geert).

>  	if (kstrtoul(buf, 0, &val) < 0)
>  		return -EINVAL;

This looks wrong because you may return -EINVAL without calling
sysfs_unbreak_active_protection().

There's no point in calling sysfs_break_active_protection() if either
kstrtoul() fails or val is zero, i.e., you could do:

  if (kstrtoul(..., &val) < 0)
    return -EINVAL;

  if (!val)
    return count;

  sysfs_break_active_protection(...);
  device_remove_file(...);
  pci_stop_and_remove_bus_device_locked(...);
  sysfs_unbreak_active_protection(...);

> -	if (val && device_remove_file_self(dev, attr))
> +	if (val) {
> +		device_remove_file(dev, attr);
>  		pci_stop_and_remove_bus_device_locked(to_pci_dev(dev));

We need some story about why we should use device_remove_file()
instead of device_remove_file_self().

We also need an explanation for why other callers of
device_remove_file_self() don't need similar changes.  They *look*
similar to this case, so we need to look at nvme_sysfs_delete(),
dcssblk_shared_store(), vfio remove_store(), s390 recover_store(),
etc, and explain why they are correct and this one is incorrect.  Or,
if they're all wrong, we need to fix them all.

Is there a way we can avoid this self-deletion situation by doing the
deletion via a workqueue item that is scheduled by remove_store() but
not executed in its context?

> +	}
> +
> +	if (kn)
> +		sysfs_unbreak_active_protection(kn);
> +
>  	return count;
>  }
>  static struct device_attribute dev_remove_attr = __ATTR(remove,
> @@ -487,11 +497,15 @@ static ssize_t dev_bus_rescan_store(struct device *dev,
>  				    const char *buf, size_t count)
>  {
>  	unsigned long val;
> +	struct kernfs_node *kn;
>  	struct pci_bus *bus = to_pci_bus(dev);
>  
>  	if (kstrtoul(buf, 0, &val) < 0)
>  		return -EINVAL;
>  
> +	kn = sysfs_break_active_protection(&dev->kobj, &attr->attr);
> +	WARN_ON_ONCE(!kn);
> +
>  	if (val) {
>  		pci_lock_rescan_remove();
>  		if (!pci_is_root_bus(bus) && list_empty(&bus->devices))
> @@ -500,6 +514,10 @@ static ssize_t dev_bus_rescan_store(struct device *dev,
>  			pci_rescan_bus(bus);
>  		pci_unlock_rescan_remove();
>  	}
> +
> +	if (kn)
> +		sysfs_unbreak_active_protection(kn);

What's the purpose of this dev_bus_rescan_store() change?  There's no
remove here, and the changelog doesn't mention this path.

>  	return count;
>  }
>  static DEVICE_ATTR(rescan, (S_IWUSR|S_IWGRP), NULL, dev_bus_rescan_store);
> -- 
> 2.18.0
> 



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux