Re: [bug report] lockdep WARN at PCI device rescan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Nov 14, 2023 / 19:58, Andy Shevchenko wrote:
> On Tue, Nov 14, 2023 at 06:11:40PM +0200, Andy Shevchenko wrote:
> > On Tue, Nov 14, 2023 at 04:57:01PM +0100, Lukas Wunner wrote:
> > > On Tue, Nov 14, 2023 at 02:04:34PM +0200, Andy Shevchenko wrote:
> > > > On Tue, Nov 14, 2023 at 11:47:15AM +0100, Heiner Kallweit wrote:
> > > > > On 14.11.2023 11:16, Wolfram Sang wrote:
> > > > > > On Tue, Nov 14, 2023 at 06:54:29AM +0000, Shinichiro Kawasaki wrote:
> 
> ...
> 
> > > > > > > The lockdep splat indicates possible deadlock between
> > > > > > > pci_rescan_remove_lock and work_completion lock have deadlock
> > > > > > > possibility.
> > > > > > > In the call stack, I found that the workqueue thread for
> > > > > > > i801_probe() calls p2sb_bar(), which locks pci_rescan_remove_lock.
> > > > > 
> > > > > i801 just uses p2sb_bar(), I don't see any issue in i801. Root cause
> > > > > seems to be in the PCI subsystem. Calling p2sb_bar() from a PCI driver
> > > > > probe callback seems to be problematic, nevertheless it's a valid API
> > > > > usage.
> > > > 
> > > > So, currently I'm lack of (good) ideas and would like to hear other (more
> > > > experienced) PCI developers on how is to address this...
> > > 
> > > Can you add a p2sb_bar_locked() library call which is used by the
> > > i801 probe path?
> > > 
> > > Basically rename p2sb_bar() to __p2sb_bar() and add a third parameter
> > > of type boolean which signifies whether it's invoked in locked context
> > > or not, then call that from p2sb_bar() and p2sb_bar_locked() wrappers.
> > 
> > It may work, I assume. Let me cook the patch.
> 
> Hmm... But this will open a window when probing phase happens, e.g. during
> boot time, no? If somebody somehow calls for full rescan, we may end up in
> the situation when P2SB is gone before accessing it in p2sb_bar().
> 
> Now I'm wondering why simple pci_dev_get() can't be sufficient in the
> p2sb_bar().

All, thanks for the discussion. It looks rather difficult to avoid the WARN.

To confirm that the deadlock is for real, I tried to remove i2c-i801 device and
did /sys/bus/pci/rescan with two commands below:

  # echo 1 > /sys/bus/pci/devices/0000\:00\:1f.4/remove
  # echo 1 > /sys/bus/pci/rescan

Then I observed the second command hangs.

I came across another fix idea: assuming the guard by pci_rescan_remove_lock is
required in p2sb_bar(), how about to do trylock? If the mutex can not be locked,
make the p2sb_bar() call fail. This way, we can avoid the deadlock between
pci_rescan_remove_lock and workqueue completion.

I created a patch below and confirmed it avoided the lockdep WARN. The i2c-i801
probe was ok at system boot. When I did the two commands above, I observed the
i2c-i801 device probe failed due to trylock failure. But I think it's far better
than hang.


diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ed6b7f48736..3e784fb6cd9 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3312,6 +3312,18 @@ void pci_lock_rescan_remove(void)
 }
 EXPORT_SYMBOL_GPL(pci_lock_rescan_remove);
 
+/*
+ * Try to acquire pci_rescan_remove_lock. Returns 1 if the mutex
+ * has been acquired successfully, and 0 on contention. Use this
+ * to acquire the lock in workqueue context to avoid potential deadlock
+ * together with work_completion.
+ */
+int pci_trylock_rescan_remove(void)
+{
+	return mutex_trylock(&pci_rescan_remove_lock);
+}
+EXPORT_SYMBOL_GPL(pci_trylock_rescan_remove);
+
 void pci_unlock_rescan_remove(void)
 {
 	mutex_unlock(&pci_rescan_remove_lock);
diff --git a/drivers/platform/x86/p2sb.c b/drivers/platform/x86/p2sb.c
index 1cf2471d54d..7a6bee8abf9 100644
--- a/drivers/platform/x86/p2sb.c
+++ b/drivers/platform/x86/p2sb.c
@@ -113,7 +113,10 @@ int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem)
 	 * Prevent concurrent PCI bus scan from seeing the P2SB device and
 	 * removing via sysfs while it is temporarily exposed.
 	 */
-	pci_lock_rescan_remove();
+	if (!pci_trylock_rescan_remove()) {
+		pr_err("P2SB device accessed during PCI rescan");
+		return -EBUSY;
+	}
 
 	/* Unhide the P2SB device, if needed */
 	pci_bus_read_config_dword(bus, devfn_p2sb, P2SBC, &value);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60ca768bc86..e6db5096217 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1439,6 +1439,7 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev);
 unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge);
 unsigned int pci_rescan_bus(struct pci_bus *bus);
 void pci_lock_rescan_remove(void);
+int pci_trylock_rescan_remove(void);
 void pci_unlock_rescan_remove(void);
 
 /* Vital Product Data routines */
-- 
2.41.0






[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux