Re: [v6] usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 26 Mar 2018, Jonathan Liu wrote:

> > I am experiencing a USB function call hang from userspace with OCHI
> > (full speed USB device) after updating from Linux 4.14.15 to 4.14.24
> > and noticed this commit.
> >
> > Here is the Linux 4.14.24 kernel stack trace (extracted from SysRq+w
> > and amended with addr2line):
> > [<c05cbd64>] (__schedule) from [<c05cc148>] (schedule+0x50/0xb4)
> > kernel/sched/core.c:2792
> > [<c05cc148>] (schedule) from [<c042789c>]
> > (usb_kill_urb.part.3+0x78/0xa8) include/asm-generic/preempt.h:59
> > [<c042789c>] (usb_kill_urb.part.3) from [<c0433a3c>]
> > (usbdev_ioctl+0x1288/0x1cf0) drivers/usb/core/urb.c:690
> > [<c0433a3c>] (usbdev_ioctl) from [<c0217db8>]
> > (do_vfs_ioctl+0x9c/0x8ec) drivers/usb/core/devio.c:1835
> > [<c0217db8>] (do_vfs_ioctl) from [<c021863c>] (SyS_ioctl+0x34/0x5c)
> > fs/ioctl.c:47
> > [<c021863c>] (SyS_ioctl) from [<c0107820>] (ret_fast_syscall+0x0/0x54)
> > include/linux/file.h:39
> >
> > Afterwards the kernel is unresponsive to disconnect/connect of the
> > full speed USB device but I can connect/disconnect a high speed USB
> > device to the same port and communicate to it without problem since it
> > uses EHCI (OHCI is companion controller). If I try to connect the full
> > speed USB device again it is still unresponsive. The userspace
> > application is still hanging after all this.
> >
> > Could this commit be causing the issue?

It sounds like the URB gets cancelled but is never given back.  But 
although the commit you mentioned does affect this logic, I don't see 
the problem could arise.

> There does seem to be something wrong with this commit. I did some
> more testing with Linux 4.16.0-rc6 on sun7i-a20-olinuxino-lime.
> I connected a sun7i-a20-pcduino3b running Linux 4.15.12 that has OTG
> port with gadget zero kernel module (g_zero) loaded to the USB port
> and did some testing with usbtest from tools/usb/testusb.c in kernel
> repository.
> 
> I had to do the following change to avoid getting flooded with
> "usbtest 1-1:3.0: unlink retry" kernel messages after test 11 of
> testusb starts:
> --- a/drivers/usb/misc/usbtest.c
> +++ b/drivers/usb/misc/usbtest.c
> @@ -1473,7 +1473,7 @@ static int unlink1(struct usbtest_dev *dev, int
> pipe, int
>                 size, int async)
>                                  * resubmission, but since we're testing unlink
>                                  * paths, we can't.
>                                  */
> -                           ERROR(dev, "unlink retry\n");
> +                         /*ERROR(dev, "unlink retry\n");*/

That's a bizarre message in any case.  The condition under which it
gets printed is _not_ an error, just a race.  I would expect it to show
up every now and then just based on random chance.  But if it gets
printed over and over again then something is definitely wrong.

> If I revert the commit, I can run "./testusb -a" to completion without
> any issues with Linux 4.16.0-rc6 on sun7i-a20-olinuxino-lime.
> I don't get a kernel stall detected when testing on x86_64 but I do
> still get at least one "unlink retry" error message from kernel in
> dmesg which can be reproduced using "./testusb -a" or more frequently
> with "./testusb -a -t 11 -c 5000".

A few of those "unlink retry" messages during a test with 5000 
iterations does not mean anything is wrong; don't worry about it.

It would be great if you could debug this farther.  I'm not sure what
to suggest, however.  Clearly we need to know what's going on in
ohci-q.c's finish_unlinks() routine; unfortunately this routine
probably gets called thousands of times before the problem shows up.

Anyway, here's a patch you can try out on the sun7i-a20-olinuxino-lime.  
I have no idea whether it will help pinpoint the source of the problem.

Alan Stern



Index: usb-4.x/drivers/usb/host/ohci-hcd.c
===================================================================
--- usb-4.x.orig/drivers/usb/host/ohci-hcd.c
+++ usb-4.x/drivers/usb/host/ohci-hcd.c
@@ -327,14 +327,21 @@ static int ohci_urb_dequeue(struct usb_h
 		 * some upcoming INTR_SF to call finish_unlinks()
 		 */
 		urb_priv = urb->hcpriv;
-		if (urb_priv->ed->state == ED_OPER)
+		if (urb_priv->ed->state == ED_OPER) {
+			ohci_info(ohci, "start dequeue URB %p ed %p\n", urb,
+					urb_priv->ed);
 			start_ed_unlink(ohci, urb_priv->ed);
-
+		} else
+			ohci_info(ohci, "idle dequeue URB %p ed %p\n", urb,
+					urb_priv->ed);
 		if (ohci->rh_state != OHCI_RH_RUNNING) {
 			/* With HC dead, we can clean up right away */
 			ohci_work(ohci);
 		}
 	}
+	else
+		ohci_info(ohci, "failed dequeue URB %p\n", urb);
+
 	spin_unlock_irqrestore (&ohci->lock, flags);
 	return rc;
 }
Index: usb-4.x/drivers/usb/host/ohci-q.c
===================================================================
--- usb-4.x.orig/drivers/usb/host/ohci-q.c
+++ usb-4.x/drivers/usb/host/ohci-q.c
@@ -1022,6 +1022,7 @@ skip_ed:
 		*last = ed->ed_next;
 		ed->ed_next = NULL;
 		modified = 0;
+		ohci_info(ohci, "scanning unlinked ed %p\n", ed);
 
 		/* unlink urbs as requested, but rescan the list after
 		 * we call a completion since it might have unlinked
@@ -1072,6 +1073,8 @@ rescan_this:
 			/* if URB is done, clean up */
 			if (urb_priv->td_cnt >= urb_priv->length) {
 				modified = completed = 1;
+				ohci_info(ohci, "finish dequeue URB %p ed %p\n",
+						urb, ed);
 				finish_urb(ohci, urb, 0);
 			}
 		}




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]