Hi, On 25 March 2018 at 12:21, Jonathan Liu <net147@xxxxxxxxx> wrote: > Hi, > > On 8 February 2018 at 14:55, Jeffy Chen <jeffy.chen@xxxxxxxxxxxxxx> wrote: >> From: AMAN DEEP <aman.deep@xxxxxxxxxxx> >> >> There is a race condition between finish_unlinks->finish_urb() function >> and usb_kill_urb() in ohci controller case. The finish_urb calls >> spin_unlock(&ohci->lock) before usb_hcd_giveback_urb() function call, >> then if during this time, usb_kill_urb is called for another endpoint, >> then new ed will be added to ed_rm_list at beginning for unlink, and >> ed_rm_list will point to newly added. >> >> When finish_urb() is completed in finish_unlinks() and ed->td_list >> becomes empty as in below code (in finish_unlinks() function): >> >> if (list_empty(&ed->td_list)) { >> *last = ed->ed_next; >> ed->ed_next = NULL; >> } else if (ohci->rh_state == OHCI_RH_RUNNING) { >> *last = ed->ed_next; >> ed->ed_next = NULL; >> ed_schedule(ohci, ed); >> } >> >> The *last = ed->ed_next will make ed_rm_list to point to ed->ed_next >> and previously added ed by usb_kill_urb will be left unreferenced by >> ed_rm_list. This causes usb_kill_urb() hang forever waiting for >> finish_unlink to remove added ed from ed_rm_list. >> >> The main reason for hang in this race condtion is addition and removal >> of ed from ed_rm_list in the beginning during usb_kill_urb and later >> last* is modified in finish_unlinks(). >> >> As suggested by Alan Stern, the solution for proper handling of >> ohci->ed_rm_list is to remove ed from the ed_rm_list before finishing >> any URBs. Then at the end, we can add ed back to the list if necessary. >> >> This properly handle the updated ohci->ed_rm_list in usb_kill_urb(). >> >> Fixes:977dcfdc6031("USB:OHCI:don't lose track of EDs when a controller dies") >> Acked-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> >> CC: <stable@xxxxxxxxxxxxxxx> >> Signed-off-by: Aman Deep <aman.deep@xxxxxxxxxxx> >> Signed-off-by: Jeffy Chen <jeffy.chen@xxxxxxxxxxxxxx> >> --- >> >> Changes in v6: >> This is a resend of Aman Deep's v5 patch [0], which solved the hang we >> hit [1]. (Thanks Aman :) >> >> The v5 has some format issues, so i slightly adjust the commit message. >> >> [0] https://www.spinics.net/lists/linux-usb/msg129010.html >> [1] https://bugs.chromium.org/p/chromium/issues/detail?id=803749 >> >> drivers/usb/host/ohci-q.c | 17 ++++++++++------- >> 1 file changed, 10 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/usb/host/ohci-q.c b/drivers/usb/host/ohci-q.c >> index b2ec8c399363..4ccb85a67bb3 100644 >> --- a/drivers/usb/host/ohci-q.c >> +++ b/drivers/usb/host/ohci-q.c >> @@ -1019,6 +1019,8 @@ static void finish_unlinks(struct ohci_hcd *ohci) >> * have modified this list. normally it's just prepending >> * entries (which we'd ignore), but paranoia won't hurt. >> */ >> + *last = ed->ed_next; >> + ed->ed_next = NULL; >> modified = 0; >> >> /* unlink urbs as requested, but rescan the list after >> @@ -1077,21 +1079,22 @@ static void finish_unlinks(struct ohci_hcd *ohci) >> goto rescan_this; >> >> /* >> - * If no TDs are queued, take ED off the ed_rm_list. >> + * If no TDs are queued, ED is now idle. >> * Otherwise, if the HC is running, reschedule. >> - * If not, leave it on the list for further dequeues. >> + * If the HC isn't running, add ED back to the >> + * start of the list for later processing. >> */ >> if (list_empty(&ed->td_list)) { >> - *last = ed->ed_next; >> - ed->ed_next = NULL; >> ed->state = ED_IDLE; >> list_del(&ed->in_use_list); >> } else if (ohci->rh_state == OHCI_RH_RUNNING) { >> - *last = ed->ed_next; >> - ed->ed_next = NULL; >> ed_schedule(ohci, ed); >> } else { >> - last = &ed->ed_next; >> + ed->ed_next = ohci->ed_rm_list; >> + ohci->ed_rm_list = ed; >> + /* Don't loop on the same ED */ >> + if (last == &ohci->ed_rm_list) >> + last = &ed->ed_next; >> } >> >> if (modified) > > I am experiencing a USB function call hang from userspace with OCHI > (full speed USB device) after updating from Linux 4.14.15 to 4.14.24 > and noticed this commit. > > Here is the Linux 4.14.24 kernel stack trace (extracted from SysRq+w > and amended with addr2line): > [<c05cbd64>] (__schedule) from [<c05cc148>] (schedule+0x50/0xb4) > kernel/sched/core.c:2792 > [<c05cc148>] (schedule) from [<c042789c>] > (usb_kill_urb.part.3+0x78/0xa8) include/asm-generic/preempt.h:59 > [<c042789c>] (usb_kill_urb.part.3) from [<c0433a3c>] > (usbdev_ioctl+0x1288/0x1cf0) drivers/usb/core/urb.c:690 > [<c0433a3c>] (usbdev_ioctl) from [<c0217db8>] > (do_vfs_ioctl+0x9c/0x8ec) drivers/usb/core/devio.c:1835 > [<c0217db8>] (do_vfs_ioctl) from [<c021863c>] (SyS_ioctl+0x34/0x5c) > fs/ioctl.c:47 > [<c021863c>] (SyS_ioctl) from [<c0107820>] (ret_fast_syscall+0x0/0x54) > include/linux/file.h:39 > > Afterwards the kernel is unresponsive to disconnect/connect of the > full speed USB device but I can connect/disconnect a high speed USB > device to the same port and communicate to it without problem since it > uses EHCI (OHCI is companion controller). If I try to connect the full > speed USB device again it is still unresponsive. The userspace > application is still hanging after all this. > > Could this commit be causing the issue? Note that I have been running with OHCI_QUIRK_QEMU added to ochi->flags in ohci_init of drivers/usb/host/ohci-hcd.c for both 4.14.15 and 4.14.24 kernels due to a race condition that was later addressed by https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.14.23&id=85c3d26bd754160f6be90d8b078d70a1b35820e7, so io_watchdog_func of drivers/usb/host/ohci-hcd.c is not being run. Thanks. Regards, Jonathan