[PATCH] usb: cdc-wdm: Fix "scheduling while atomic" after failed write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a partial revert of

  commit 860e41a71 usb: cdc-wdm: Fix race between write and disconnect

which caused lockups and assorted "general protection fault" and
"scheduling while atomic" messages when concurrent nonblocking
writes failed and were followed by an immediate disconnect.  The
problem was discovered while developing userspace software for this
driver. This gave us a reliable way to reproduce the bug.

The following bisect log pointed to the cause:

 # bad: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2
 # good: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32
 git bisect start 'v3.2' 'v2.6.32' '--' 'drivers/usb/class/cdc-wdm.c'
 # bad: [d93d16e9aa58887feadd999ea26b7b8139e98b56] usb: cdc-wdm: Fix order in disconnect and fix locking
 git bisect bad d93d16e9aa58887feadd999ea26b7b8139e98b56
 # bad: [860e41a71c1731e79e1920dc42676bafc925af5e] usb: cdc-wdm: Fix race between write and disconnect
 git bisect bad 860e41a71c1731e79e1920dc42676bafc925af5e
 # good: [86266452f80545285c14e20a8024f79c4fb88a86] USB: Push BKL on open down into the drivers
 git bisect good 86266452f80545285c14e20a8024f79c4fb88a86
 # good: [94015f6e6ba11040f75f4b42aada8de23965290e] USB: BKL removal: cdc-wdm
 git bisect good 94015f6e6ba11040f75f4b42aada8de23965290e
 860e41a71c1731e79e1920dc42676bafc925af5e is the first bad commit

So the regression was introduced between 2.6.33 and 2.6.34!

This patch attempts to revert as much as possible of the bad commit
on top of later changes, presumably leaving the original problem
unsolved.  Reverting it now anyway, as the resulting bug is a far
worse problem than any theoretical race condition and regressions
are unacceptable in any case.

Reported-and-tested-by: Aleksander Morgado <aleksander@xxxxxxxxxx>
Cc: Oliver Neukum <oliver@xxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> # needs minor context adjustment - will submit backport
Signed-off-by: Bjørn Mork <bjorn@xxxxxxx>
---
A few more comments for anyone wanting to challenge this...

Aleksander Morgado found the bug and provided an excellent test case
available in the "stuck" branch in his 
 git://gitorious.org/lanedo/libqmi-glib.git
repo (commit dfb4ece).  The "cli/qmicli" tool there will send two QMI
messages close enough in time to hit the "concurrent write" test in most
cases.  The system will lockup a few seconds later, sometimes logging 
"general protection fault" and/or "scheduling while atomic" first. These
do not point to cdc-wdm, but some unrelated process.

I bisected this on 3.2 exclusively, just changing the parts necessary to
build and load on CDC interfaces with unrelated bulk endpoints:

 1) ensure that maxcom is set to some sane value without and
   USB_CDC_DMM_TYPE descriptor
 2) remove "if (iface->desc.bNumEndpoints != 1)"  test
 3) s/usb_buffer_free/usb_free_coherent/ and 
    s/usb_buffer_alloc/usb_alloc_coherent/

and then just adding the device ID of some QMI/wwan device using
/sys/bus/usb/drivers/cdc_wdm/new_id

This allowed me speak QMI via this driver using all revisions back to
2.6.32 (and most likely before, but 2.6.32 was found to be good). Note
that all this was done with the *plain* cdc-wdm driver, without any
subdriver code or the like.  This is in no way related to QMI or the
new qmi_wwan driver.  It's just the use case which made the bug
visible and the tool I had for easily reproducing it.  I am pretty
sure that the bug can be reproduced using standard CDC WDM devices as
well.

I did make a couple of attempts on finding the problem before bisecting,
but failed.  I must admit that I understand neither the bad commit nor
the effect of the revert.  But both Aleksander and I have tested the
reverting patch below and found that it does in fact resolve the
problem.  If anyone feels up to rewriting this solving the original
problem without introducing the regression, then please do so.

But for now I belive the seriousness of this bug does warrant an
immediate fix in 3.4.

Thanks a lot to Aleksander for finding the bug and providing a
tool to reproduce it!


Bjørn



 drivers/usb/class/cdc-wdm.c |   45 ++++++++++++++++++-------------------------
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index c6f6560..5fbad14 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -338,38 +338,15 @@ static ssize_t wdm_write
 	if (we < 0)
 		return -EIO;
 
-	desc->outbuf = buf = kmalloc(count, GFP_KERNEL);
-	if (!buf) {
-		rv = -ENOMEM;
-		goto outnl;
-	}
-
-	r = copy_from_user(buf, buffer, count);
-	if (r > 0) {
-		kfree(buf);
-		rv = -EFAULT;
-		goto outnl;
-	}
-
 	/* concurrent writes and disconnect */
 	r = mutex_lock_interruptible(&desc->wlock);
 	rv = -ERESTARTSYS;
-	if (r) {
-		kfree(buf);
+	if (r)
 		goto outnl;
-	}
-
-	if (test_bit(WDM_DISCONNECTING, &desc->flags)) {
-		kfree(buf);
-		rv = -ENODEV;
-		goto outnp;
-	}
 
 	r = usb_autopm_get_interface(desc->intf);
-	if (r < 0) {
-		kfree(buf);
+	if (r < 0)
 		goto outnp;
-	}
 
 	if (!(file->f_flags & O_NONBLOCK))
 		r = wait_event_interruptible(desc->wait, !test_bit(WDM_IN_USE,
@@ -381,8 +358,24 @@ static ssize_t wdm_write
 	if (test_bit(WDM_RESETTING, &desc->flags))
 		r = -EIO;
 
-	if (r < 0) {
+	if (r < 0)
+		goto out;
+	
+	if (test_bit(WDM_DISCONNECTING, &desc->flags)) {
+		rv = -ENODEV;
+		goto out;
+	}
+	
+	desc->outbuf = buf = kmalloc(count, GFP_KERNEL);
+	if (!buf) {
+		rv = -ENOMEM;
+		goto out;
+	}
+	
+	r = copy_from_user(buf, buffer, count);
+	if (r > 0) {
 		kfree(buf);
+		rv = -EFAULT;
 		goto out;
 	}
 
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux