Re: Lockdep problem: v3.18+ (with yesterday's Linus tip - 6f51ee709e4c)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 18 Dec 2014, Greg Kroah-Hartman wrote:

> On Thu, Dec 18, 2014 at 02:40:48PM -0500, Alan Stern wrote:
> > On Thu, 18 Dec 2014, Russell King - ARM Linux wrote:
> > 
> > > While unplugging a Logitek Keyboard/mouse micro-receiver, I got the
> > > lockdep splat below.
> > > 
> > > However, I don't fully understand this splat - I see nothing in
> > > flush_work() nor process_one_work() making use of "intf->reset_ws" -
> > > which seems to be a USB thing.  I guess lockdep is being re-used to
> > > validate work stuff, and "lock" is just plain misleading.
> > 
> > That sounds right.  intf->reset_ws is a work_struct for a reset 
> > request.
> > 
> > > usb 2-1.1: USB disconnect, device number 3
> > > 
> > > =============================================
> > > [ INFO: possible recursive locking detected ]
> > > 3.18.0+ #1459 Not tainted
> > > ---------------------------------------------
> > > kworker/0:1/2758 is trying to acquire lock:
> > >  ((&intf->reset_ws)){+.+.+.}, at: [<c003ba90>] flush_work+0x0/0x264
> > > 
> > > but task is already holding lock:
> > >  ((&intf->reset_ws)){+.+.+.}, at: [<c003ca40>] process_one_work+0x130/0x4b4
> > 
> > I think what happened is that we called cancel_work_sync() for the 
> > reset_ws embedded in one intf structure while running in the workqueue 
> > routine for the reset_ws embedded in a different intf structure.
> > 
> > Assuming this interpretation is correct, I don't see how we can prevent 
> > lockdep from making this mistake.  Here's the problem:
> > 
> > 	An interface driver can queue a request for a reset.
> > 
> > 	A reset can cause interface drivers to be unbound from their
> > 	interfaces.
> > 
> > 	If an interface driver is unbound, any pending reset request 
> > 	that it queued has to be cancelled.  Otherwise the workqueue
> > 	would most likely end up carrying out the reset at a later time 
> > 	when nobody wants it.
> > 
> > (The code contains explicit protection for the case where the interface 
> > being unbound is the one whose reset request is currently in progress.)
> > 
> > Now perhaps we don't need that last step.  We could allow a queued
> > reset to take place even after the driver that requested it is gone.  
> > Most of the time these reset requests occur in response to I/O errors
> > when a USB device is unplugged -- as happened in this example -- in
> > which case it doesn't matter what we do.
> > 
> > So even though it's not entirely pleasant, my inclination is to solve
> > the problem by getting rid of usb_cancel_queued_reset() in
> > drivers/usb/core/driver.c.
> > 
> > Comments, anybody?
> 
> That seems reasonable to me, unbinding when a reset is happening is
> going to be a rare condition, but if we get rid of it, and we try to
> queue a reset for a device that is gone, we will just fail the reset,
> right?  If all should be fine, I have no objection to removing it.

Russell, can you reproduce that lockdep violation whenever you want?

If you can, does the following patch help?

Alan Stern

-------------------------------------------------------------------------

The USB stack provides a mechanism for drivers to request an
asynchronous device reset (usb_queue_reset_device()).  The mechanism
uses a work item (reset_ws) embedded in the usb_interface structure
used by the driver, and the reset is carried out by a work queue
routine.

The asynchronous reset can race with driver unbinding.  When this
happens, we try to cancel the queued reset before unbinding the
driver, on the theory that the driver won't care about any resets once
it is unbound.

However, thanks to the fact that lockdep now tracks work queue
accesses, this can provoke a lockdep warning in situations where the
device reset causes another interface's driver to be unbound; see

	http://marc.info/?l=linux-usb&m=141893165203776&w=2

for an example.  The reason is that the work routine for reset_ws in
one interface calls cancel_queued_work() for the reset_ws in another
interface.  Lockdep thinks this might lead to a work routine trying to
cancel itself.  The simplest solution is not to cancel queued resets
when unbinding drivers.

This means we now need to acquire a reference to the usb_interface
when queuing a reset_ws work item and to drop the reference when the
work routine finishes.  We also need to make sure that the
usb_interface structure doesn't outlive its parent usb_device; this
means acquiring and dropping a reference when the interface is created
and destroyed.

Signed-off-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
Reported-by: Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx>

---

 drivers/usb/core/driver.c  |   17 -----------------
 drivers/usb/core/hub.c     |   25 +++++++++----------------
 drivers/usb/core/message.c |   23 +++--------------------
 include/linux/usb.h        |    5 -----
 4 files changed, 12 insertions(+), 58 deletions(-)

Index: usb-3.19/include/linux/usb.h
===================================================================
--- usb-3.19.orig/include/linux/usb.h
+++ usb-3.19/include/linux/usb.h
@@ -127,10 +127,6 @@ enum usb_interface_condition {
  *	to the sysfs representation for that device.
  * @pm_usage_cnt: PM usage counter for this interface
  * @reset_ws: Used for scheduling resets from atomic context.
- * @reset_running: set to 1 if the interface is currently running a
- *      queued reset so that usb_cancel_queued_reset() doesn't try to
- *      remove from the workqueue when running inside the worker
- *      thread. See __usb_queue_reset_device().
  * @resetting_device: USB core reset the device, so use alt setting 0 as
  *	current; needs bandwidth alloc after reset.
  *
@@ -181,7 +177,6 @@ struct usb_interface {
 	unsigned needs_remote_wakeup:1;	/* driver requires remote wakeup */
 	unsigned needs_altsetting0:1;	/* switch to altsetting 0 is pending */
 	unsigned needs_binding:1;	/* needs delayed unbind/rebind */
-	unsigned reset_running:1;
 	unsigned resetting_device:1;	/* true: bandwidth alloc after reset */
 
 	struct device dev;		/* interface specific device info */
Index: usb-3.19/drivers/usb/core/driver.c
===================================================================
--- usb-3.19.orig/drivers/usb/core/driver.c
+++ usb-3.19/drivers/usb/core/driver.c
@@ -275,21 +275,6 @@ static int usb_unbind_device(struct devi
 	return 0;
 }
 
-/*
- * Cancel any pending scheduled resets
- *
- * [see usb_queue_reset_device()]
- *
- * Called after unconfiguring / when releasing interfaces. See
- * comments in __usb_queue_reset_device() regarding
- * udev->reset_running.
- */
-static void usb_cancel_queued_reset(struct usb_interface *iface)
-{
-	if (iface->reset_running == 0)
-		cancel_work_sync(&iface->reset_ws);
-}
-
 /* called from driver core with dev locked */
 static int usb_probe_interface(struct device *dev)
 {
@@ -380,7 +365,6 @@ static int usb_probe_interface(struct de
 	usb_set_intfdata(intf, NULL);
 	intf->needs_remote_wakeup = 0;
 	intf->condition = USB_INTERFACE_UNBOUND;
-	usb_cancel_queued_reset(intf);
 
 	/* If the LPM disable succeeded, balance the ref counts. */
 	if (!lpm_disable_error)
@@ -425,7 +409,6 @@ static int usb_unbind_interface(struct d
 		usb_disable_interface(udev, intf, false);
 
 	driver->disconnect(intf);
-	usb_cancel_queued_reset(intf);
 
 	/* Free streams */
 	for (i = 0, j = 0; i < intf->cur_altsetting->desc.bNumEndpoints; i++) {
Index: usb-3.19/drivers/usb/core/hub.c
===================================================================
--- usb-3.19.orig/drivers/usb/core/hub.c
+++ usb-3.19/drivers/usb/core/hub.c
@@ -5589,26 +5589,19 @@ EXPORT_SYMBOL_GPL(usb_reset_device);
  *   possible; depending on how the driver attached to each interface
  *   handles ->pre_reset(), the second reset might happen or not.
  *
- * - If a driver is unbound and it had a pending reset, the reset will
- *   be cancelled.
+ * - If the reset is delayed so long that the interface is unbound from
+ *   its driver, the reset will be skipped.
  *
- * - This function can be called during .probe() or .disconnect()
- *   times. On return from .disconnect(), any pending resets will be
- *   cancelled.
- *
- * There is no no need to lock/unlock the @reset_ws as schedule_work()
- * does its own.
- *
- * NOTE: We don't do any reference count tracking because it is not
- *     needed. The lifecycle of the work_struct is tied to the
- *     usb_interface. Before destroying the interface we cancel the
- *     work_struct, so the fact that work_struct is queued and or
- *     running means the interface (and thus, the device) exist and
- *     are referenced.
+ * - This function can be called during .probe().  It can also be called
+ *   during .disconnect(), but doing so is pointless because the reset
+ *   will not occur.  If you really want to reset the device during
+ *   .disconnect(), call usb_reset_device() directly -- but watch out
+ *   for nested unbinding issues!
  */
 void usb_queue_reset_device(struct usb_interface *iface)
 {
-	schedule_work(&iface->reset_ws);
+	if (schedule_work(&iface->reset_ws))
+		usb_get_intf(iface);
 }
 EXPORT_SYMBOL_GPL(usb_queue_reset_device);
 
Index: usb-3.19/drivers/usb/core/message.c
===================================================================
--- usb-3.19.orig/drivers/usb/core/message.c
+++ usb-3.19/drivers/usb/core/message.c
@@ -1551,6 +1551,7 @@ static void usb_release_interface(struct
 			altsetting_to_usb_interface_cache(intf->altsetting);
 
 	kref_put(&intfc->ref, usb_release_interface_cache);
+	usb_put_dev(interface_to_usbdev(intf));
 	kfree(intf);
 }
 
@@ -1626,24 +1627,6 @@ static struct usb_interface_assoc_descri
 
 /*
  * Internal function to queue a device reset
- *
- * This is initialized into the workstruct in 'struct
- * usb_device->reset_ws' that is launched by
- * message.c:usb_set_configuration() when initializing each 'struct
- * usb_interface'.
- *
- * It is safe to get the USB device without reference counts because
- * the life cycle of @iface is bound to the life cycle of @udev. Then,
- * this function will be ran only if @iface is alive (and before
- * freeing it any scheduled instances of it will have been cancelled).
- *
- * We need to set a flag (usb_dev->reset_running) because when we call
- * the reset, the interfaces might be unbound. The current interface
- * cannot try to remove the queued work as it would cause a deadlock
- * (you cannot remove your work from within your executing
- * workqueue). This flag lets it know, so that
- * usb_cancel_queued_reset() doesn't try to do it.
- *
  * See usb_queue_reset_device() for more details
  */
 static void __usb_queue_reset_device(struct work_struct *ws)
@@ -1655,11 +1638,10 @@ static void __usb_queue_reset_device(str
 
 	rc = usb_lock_device_for_reset(udev, iface);
 	if (rc >= 0) {
-		iface->reset_running = 1;
 		usb_reset_device(udev);
-		iface->reset_running = 0;
 		usb_unlock_device(udev);
 	}
+	usb_put_intf(iface);	/* Undo _get_ in usb_queue_reset_device() */
 }
 
 
@@ -1854,6 +1836,7 @@ free_interfaces:
 		dev_set_name(&intf->dev, "%d-%s:%d.%d",
 			dev->bus->busnum, dev->devpath,
 			configuration, alt->desc.bInterfaceNumber);
+		usb_get_dev(dev);
 	}
 	kfree(new_interfaces);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux