On Mon, 26 Mar 2018, Jonathan Liu wrote: > > I am experiencing a USB function call hang from userspace with OCHI > > (full speed USB device) after updating from Linux 4.14.15 to 4.14.24 > > and noticed this commit. > > > > Here is the Linux 4.14.24 kernel stack trace (extracted from SysRq+w > > and amended with addr2line): > > [<c05cbd64>] (__schedule) from [<c05cc148>] (schedule+0x50/0xb4) > > kernel/sched/core.c:2792 > > [<c05cc148>] (schedule) from [<c042789c>] > > (usb_kill_urb.part.3+0x78/0xa8) include/asm-generic/preempt.h:59 > > [<c042789c>] (usb_kill_urb.part.3) from [<c0433a3c>] > > (usbdev_ioctl+0x1288/0x1cf0) drivers/usb/core/urb.c:690 > > [<c0433a3c>] (usbdev_ioctl) from [<c0217db8>] > > (do_vfs_ioctl+0x9c/0x8ec) drivers/usb/core/devio.c:1835 > > [<c0217db8>] (do_vfs_ioctl) from [<c021863c>] (SyS_ioctl+0x34/0x5c) > > fs/ioctl.c:47 > > [<c021863c>] (SyS_ioctl) from [<c0107820>] (ret_fast_syscall+0x0/0x54) > > include/linux/file.h:39 > > > > Afterwards the kernel is unresponsive to disconnect/connect of the > > full speed USB device but I can connect/disconnect a high speed USB > > device to the same port and communicate to it without problem since it > > uses EHCI (OHCI is companion controller). If I try to connect the full > > speed USB device again it is still unresponsive. The userspace > > application is still hanging after all this. > > > > Could this commit be causing the issue? It sounds like the URB gets cancelled but is never given back. But although the commit you mentioned does affect this logic, I don't see the problem could arise. > There does seem to be something wrong with this commit. I did some > more testing with Linux 4.16.0-rc6 on sun7i-a20-olinuxino-lime. > I connected a sun7i-a20-pcduino3b running Linux 4.15.12 that has OTG > port with gadget zero kernel module (g_zero) loaded to the USB port > and did some testing with usbtest from tools/usb/testusb.c in kernel > repository. > > I had to do the following change to avoid getting flooded with > "usbtest 1-1:3.0: unlink retry" kernel messages after test 11 of > testusb starts: > --- a/drivers/usb/misc/usbtest.c > +++ b/drivers/usb/misc/usbtest.c > @@ -1473,7 +1473,7 @@ static int unlink1(struct usbtest_dev *dev, int > pipe, int > size, int async) > * resubmission, but since we're testing unlink > * paths, we can't. > */ > - ERROR(dev, "unlink retry\n"); > + /*ERROR(dev, "unlink retry\n");*/ That's a bizarre message in any case. The condition under which it gets printed is _not_ an error, just a race. I would expect it to show up every now and then just based on random chance. But if it gets printed over and over again then something is definitely wrong. > If I revert the commit, I can run "./testusb -a" to completion without > any issues with Linux 4.16.0-rc6 on sun7i-a20-olinuxino-lime. > I don't get a kernel stall detected when testing on x86_64 but I do > still get at least one "unlink retry" error message from kernel in > dmesg which can be reproduced using "./testusb -a" or more frequently > with "./testusb -a -t 11 -c 5000". A few of those "unlink retry" messages during a test with 5000 iterations does not mean anything is wrong; don't worry about it. It would be great if you could debug this farther. I'm not sure what to suggest, however. Clearly we need to know what's going on in ohci-q.c's finish_unlinks() routine; unfortunately this routine probably gets called thousands of times before the problem shows up. Anyway, here's a patch you can try out on the sun7i-a20-olinuxino-lime. I have no idea whether it will help pinpoint the source of the problem. Alan Stern Index: usb-4.x/drivers/usb/host/ohci-hcd.c =================================================================== --- usb-4.x.orig/drivers/usb/host/ohci-hcd.c +++ usb-4.x/drivers/usb/host/ohci-hcd.c @@ -327,14 +327,21 @@ static int ohci_urb_dequeue(struct usb_h * some upcoming INTR_SF to call finish_unlinks() */ urb_priv = urb->hcpriv; - if (urb_priv->ed->state == ED_OPER) + if (urb_priv->ed->state == ED_OPER) { + ohci_info(ohci, "start dequeue URB %p ed %p\n", urb, + urb_priv->ed); start_ed_unlink(ohci, urb_priv->ed); - + } else + ohci_info(ohci, "idle dequeue URB %p ed %p\n", urb, + urb_priv->ed); if (ohci->rh_state != OHCI_RH_RUNNING) { /* With HC dead, we can clean up right away */ ohci_work(ohci); } } + else + ohci_info(ohci, "failed dequeue URB %p\n", urb); + spin_unlock_irqrestore (&ohci->lock, flags); return rc; } Index: usb-4.x/drivers/usb/host/ohci-q.c =================================================================== --- usb-4.x.orig/drivers/usb/host/ohci-q.c +++ usb-4.x/drivers/usb/host/ohci-q.c @@ -1022,6 +1022,7 @@ skip_ed: *last = ed->ed_next; ed->ed_next = NULL; modified = 0; + ohci_info(ohci, "scanning unlinked ed %p\n", ed); /* unlink urbs as requested, but rescan the list after * we call a completion since it might have unlinked @@ -1072,6 +1073,8 @@ rescan_this: /* if URB is done, clean up */ if (urb_priv->td_cnt >= urb_priv->length) { modified = completed = 1; + ohci_info(ohci, "finish dequeue URB %p ed %p\n", + urb, ed); finish_urb(ohci, urb, 0); } }