Am Montag, den 08.06.2020, 11:24 +0900 schrieb Tetsuo Handa: Hi, sorry for being late in reply. I have had an emergency to take care of. > On 2020/05/31 0:47, Alan Stern wrote: > > On Sat, May 30, 2020 at 05:25:11PM +0200, Oliver Neukum wrote: > > > Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern: > > > > This sounds like a bug in the driver. What would it do if someone had a > > > > > > Arguably yes. I will introduce a timeout. Unfortunately flush() > > > requires a non-interruptible sleep, as you cannot sanely return EAGAIN. > > > > But maybe you can kill some URBs and drop some data. > > You mean call usb_kill_urb() via kill_urbs() ? I have to correct myself. We can return -EINTR. But that is no solution ultimately. We could not close the fd, though we would not hang. > As far as I tested, it seems that usb_kill_urb() sometimes fails to call > wdm_out_callback() despite the comment for usb_kill_urb() says > > * This routine cancels an in-progress request. It is guaranteed that > * upon return all completion handlers will have finished and the URB > * will be totally idle and available for reuse. These features make > * this an ideal way to stop I/O in a disconnect() callback or close() > * function. If the request has not already finished or been unlinked > * the completion handler will see urb->status == -ENOENT. It looks like it does exactly as the description says. Cancelling an URB is by necessity a race condition. It can always finish before you can kill it. > . Is something still wrong? Or just replacing > > BUG_ON(test_bit(WDM_IN_USE, &desc->flags) && > !test_bit(WDM_DISCONNECTING, &desc->flags)); > > with > > wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || > test_bit(WDM_DISCONNECTING, &desc->flags)); > > in the patch shown below is sufficient? > > diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c > index e3db6fbeadef..3e92e79ce0a0 100644 > --- a/drivers/usb/class/cdc-wdm.c > +++ b/drivers/usb/class/cdc-wdm.c > @@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb) > kfree(desc->outbuf); > desc->outbuf = NULL; > clear_bit(WDM_IN_USE, &desc->flags); > - wake_up(&desc->wait); > + wake_up_all(&desc->wait); > } > > static void wdm_in_callback(struct urb *urb) > @@ -424,6 +424,7 @@ static ssize_t wdm_write > if (rv < 0) { > desc->outbuf = NULL; > clear_bit(WDM_IN_USE, &desc->flags); > + wake_up_all(&desc->wait); > dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); > rv = usb_translate_errors(rv); > goto out_free_mem_pm; > @@ -587,15 +588,16 @@ static int wdm_flush(struct file *file, fl_owner_t id) > { > struct wdm_device *desc = file->private_data; > > - wait_event(desc->wait, > - /* > - * needs both flags. We cannot do with one > - * because resetting it would cause a race > - * with write() yet we need to signal > - * a disconnect > - */ > - !test_bit(WDM_IN_USE, &desc->flags) || > - test_bit(WDM_DISCONNECTING, &desc->flags)); > + /* > + * needs both flags. We cannot do with one because resetting it would > + * cause a race with write() yet we need to signal a disconnect > + */ > + if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || > + test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) { > + kill_urbs(desc); No. We cannot just kill all URBs just because one fd's owner wants to flush. In fact we have multiple code paths that can reach the same hang. Could you test the attached patches? Regards Oliver
From 27cd2e25b37af973b61b77217fa2dad822889ff8 Mon Sep 17 00:00:00 2001 From: Oliver Neukum <oneukum@xxxxxxxx> Date: Wed, 24 Jun 2020 10:52:03 +0200 Subject: [PATCH 1/2] CDC-WDM: fix hangs in flush() When flushing a task needs to wait a bounded time, as a hardware failure could mean eternal sleep. So an arbitrary timeout is introduced. Simply making the syscall interruptible will not do the job, as while the syscall would not hang, the fd would be unclosable. In addition a flush() and a write() may be waiting for the same IO to complete. Hence completion of output must use wake_up_all(), even in error handling. Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Signed-off-by: Oliver Neukum <oneukum@xxxxxxxx> --- drivers/usb/class/cdc-wdm.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index e3db6fbeadef..ec5412773c57 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -58,6 +58,9 @@ MODULE_DEVICE_TABLE (usb, wdm_ids); #define WDM_MAX 16 +/* flush() needs to be uninterruptible, but we cannot wait forever */ +#define WDM_FLUSH_TIMEOUT (30 * HZ) + /* CDC-WMC r1.1 requires wMaxCommand to be "at least 256 decimal (0x100)" */ #define WDM_DEFAULT_BUFSIZE 256 @@ -151,7 +154,7 @@ static void wdm_out_callback(struct urb *urb) kfree(desc->outbuf); desc->outbuf = NULL; clear_bit(WDM_IN_USE, &desc->flags); - wake_up(&desc->wait); + wake_up_all(&desc->wait); } static void wdm_in_callback(struct urb *urb) @@ -424,6 +427,7 @@ static ssize_t wdm_write if (rv < 0) { desc->outbuf = NULL; clear_bit(WDM_IN_USE, &desc->flags); + wake_up_all(&desc->wait); /* for flush() */ dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); rv = usb_translate_errors(rv); goto out_free_mem_pm; @@ -586,8 +590,9 @@ static ssize_t wdm_read static int wdm_flush(struct file *file, fl_owner_t id) { struct wdm_device *desc = file->private_data; + int rv; - wait_event(desc->wait, + rv = wait_event_interruptible_timeout(desc->wait, /* * needs both flags. We cannot do with one * because resetting it would cause a race @@ -595,11 +600,16 @@ static int wdm_flush(struct file *file, fl_owner_t id) * a disconnect */ !test_bit(WDM_IN_USE, &desc->flags) || - test_bit(WDM_DISCONNECTING, &desc->flags)); + test_bit(WDM_DISCONNECTING, &desc->flags), + WDM_FLUSH_TIMEOUT); /* cannot dereference desc->intf if WDM_DISCONNECTING */ if (test_bit(WDM_DISCONNECTING, &desc->flags)) return -ENODEV; + if (!rv) + return -EIO; + if (rv < 0) + return -EINTR; if (desc->werr < 0) dev_err(&desc->intf->dev, "Error in flush path: %d\n", desc->werr); @@ -656,6 +666,14 @@ static int wdm_open(struct inode *inode, struct file *file) goto out; } + /* + * in case flush() had timed out + */ + usb_kill_urb(desc->command); + spin_lock_irq(&desc->iuspin); + desc->werr = 0; + spin_unlock_irq(&desc->iuspin); + /* using write lock to protect desc->count */ mutex_lock(&desc->wlock); if (!desc->count++) { -- 2.16.4
From d588b8034b734ecce0575ae1110d3ab5a386e049 Mon Sep 17 00:00:00 2001 From: Oliver Neukum <oneukum@xxxxxxxx> Date: Thu, 25 Jun 2020 11:53:54 +0200 Subject: [PATCH 2/2] CDC-WDM: fix race reporting errors in flush In case a race was lost and multiple fds used, an error could be reported multiple times. To fix this a spinlock must be taken. Signed-off-by: Oliver Neukum <oneukum@xxxxxxxx> --- drivers/usb/class/cdc-wdm.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index ec5412773c57..e9e8277a0c69 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -610,11 +610,16 @@ static int wdm_flush(struct file *file, fl_owner_t id) return -EIO; if (rv < 0) return -EINTR; - if (desc->werr < 0) - dev_err(&desc->intf->dev, "Error in flush path: %d\n", - desc->werr); - return usb_translate_errors(desc->werr); + spin_lock_irq(&desc->iuspin); + rv = desc->werr; + desc->werr = 0; + spin_unlock_irq(&desc->iuspin); + + if (rv < 0) + dev_err(&desc->intf->dev, "Error in flush path: %d\n", rv); + + return usb_translate_errors(rv); } static __poll_t wdm_poll(struct file *file, struct poll_table_struct *wait) -- 2.16.4