This is a partial revert of commit 860e41a71 usb: cdc-wdm: Fix race between write and disconnect which caused lockups and assorted "general protection fault" and "scheduling while atomic" messages when concurrent nonblocking writes failed and were followed by an immediate disconnect. The problem was discovered while developing userspace software for this driver. This gave us a reliable way to reproduce the bug. The following bisect log pointed to the cause: # bad: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2 # good: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32 git bisect start 'v3.2' 'v2.6.32' '--' 'drivers/usb/class/cdc-wdm.c' # bad: [d93d16e9aa58887feadd999ea26b7b8139e98b56] usb: cdc-wdm: Fix order in disconnect and fix locking git bisect bad d93d16e9aa58887feadd999ea26b7b8139e98b56 # bad: [860e41a71c1731e79e1920dc42676bafc925af5e] usb: cdc-wdm: Fix race between write and disconnect git bisect bad 860e41a71c1731e79e1920dc42676bafc925af5e # good: [86266452f80545285c14e20a8024f79c4fb88a86] USB: Push BKL on open down into the drivers git bisect good 86266452f80545285c14e20a8024f79c4fb88a86 # good: [94015f6e6ba11040f75f4b42aada8de23965290e] USB: BKL removal: cdc-wdm git bisect good 94015f6e6ba11040f75f4b42aada8de23965290e 860e41a71c1731e79e1920dc42676bafc925af5e is the first bad commit So the regression was introduced between 2.6.33 and 2.6.34! This patch attempts to revert as much as possible of the bad commit on top of later changes, presumably leaving the original problem unsolved. Reverting it now anyway, as the resulting bug is a far worse problem than any theoretical race condition and regressions are unacceptable in any case. Reported-and-tested-by: Aleksander Morgado <aleksander@xxxxxxxxxx> Cc: Oliver Neukum <oliver@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> # needs minor context adjustment - will submit backport Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> --- A few more comments for anyone wanting to challenge this... Aleksander Morgado found the bug and provided an excellent test case available in the "stuck" branch in his git://gitorious.org/lanedo/libqmi-glib.git repo (commit dfb4ece). The "cli/qmicli" tool there will send two QMI messages close enough in time to hit the "concurrent write" test in most cases. The system will lockup a few seconds later, sometimes logging "general protection fault" and/or "scheduling while atomic" first. These do not point to cdc-wdm, but some unrelated process. I bisected this on 3.2 exclusively, just changing the parts necessary to build and load on CDC interfaces with unrelated bulk endpoints: 1) ensure that maxcom is set to some sane value without and USB_CDC_DMM_TYPE descriptor 2) remove "if (iface->desc.bNumEndpoints != 1)" test 3) s/usb_buffer_free/usb_free_coherent/ and s/usb_buffer_alloc/usb_alloc_coherent/ and then just adding the device ID of some QMI/wwan device using /sys/bus/usb/drivers/cdc_wdm/new_id This allowed me speak QMI via this driver using all revisions back to 2.6.32 (and most likely before, but 2.6.32 was found to be good). Note that all this was done with the *plain* cdc-wdm driver, without any subdriver code or the like. This is in no way related to QMI or the new qmi_wwan driver. It's just the use case which made the bug visible and the tool I had for easily reproducing it. I am pretty sure that the bug can be reproduced using standard CDC WDM devices as well. I did make a couple of attempts on finding the problem before bisecting, but failed. I must admit that I understand neither the bad commit nor the effect of the revert. But both Aleksander and I have tested the reverting patch below and found that it does in fact resolve the problem. If anyone feels up to rewriting this solving the original problem without introducing the regression, then please do so. But for now I belive the seriousness of this bug does warrant an immediate fix in 3.4. Thanks a lot to Aleksander for finding the bug and providing a tool to reproduce it! Bjørn drivers/usb/class/cdc-wdm.c | 45 ++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 26 deletions(-) diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index c6f6560..5fbad14 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -338,38 +338,15 @@ static ssize_t wdm_write if (we < 0) return -EIO; - desc->outbuf = buf = kmalloc(count, GFP_KERNEL); - if (!buf) { - rv = -ENOMEM; - goto outnl; - } - - r = copy_from_user(buf, buffer, count); - if (r > 0) { - kfree(buf); - rv = -EFAULT; - goto outnl; - } - /* concurrent writes and disconnect */ r = mutex_lock_interruptible(&desc->wlock); rv = -ERESTARTSYS; - if (r) { - kfree(buf); + if (r) goto outnl; - } - - if (test_bit(WDM_DISCONNECTING, &desc->flags)) { - kfree(buf); - rv = -ENODEV; - goto outnp; - } r = usb_autopm_get_interface(desc->intf); - if (r < 0) { - kfree(buf); + if (r < 0) goto outnp; - } if (!(file->f_flags & O_NONBLOCK)) r = wait_event_interruptible(desc->wait, !test_bit(WDM_IN_USE, @@ -381,8 +358,24 @@ static ssize_t wdm_write if (test_bit(WDM_RESETTING, &desc->flags)) r = -EIO; - if (r < 0) { + if (r < 0) + goto out; + + if (test_bit(WDM_DISCONNECTING, &desc->flags)) { + rv = -ENODEV; + goto out; + } + + desc->outbuf = buf = kmalloc(count, GFP_KERNEL); + if (!buf) { + rv = -ENOMEM; + goto out; + } + + r = copy_from_user(buf, buffer, count); + if (r > 0) { kfree(buf); + rv = -EFAULT; goto out; } -- 1.7.10 -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html