Hi Andiry, I just wanted to give you a heads up about a potential bug I spent some time tracking down last week. It only showed up on the branch I've been using for the split roothub patches, which is based on patches Greg sent in for 2.6.38. Those patches include the update to make the USB core use the runtime PM interface to suspend USB devices. I haven't been able to reproduce this on a generic 2.6.37 kernel, so I suspect either the split roothub patches or the runtime suspend update revealed the bug. Here's the patch I created against my branch. The xHCI suspend/resume variables were moved around into a bus_state structure, but I think you can get the general gist of it. Can you look it over, and figure out if this bug is possible in 2.6.38? I'm still not sure if it was something I introduced with the split roothub patches. If it would show up in 2.6.38, I'll revise the patch against 2.6.38. Sarah Sharp 8<-------------------------------------------------------------------->8 >From 0e3395514321b57ed185edc3fe75a39189bf738d Mon Sep 17 00:00:00 2001 From: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> Date: Fri, 7 Jan 2011 11:13:00 -0800 Subject: [PATCH] xhci: Clear internal resume state on reset. The setup that displays this issue is a HS mult-tt hub plugged into the xHCI roothub. Sometimes when I quickly plug in a device into the HS hub right after the hub (but not the bus) is suspended, the hub will resume, and then there will be a transfer error on the GetStatus for the HS hub. This causes the USB core to start a reset-resume, without first checking the port status of that port with a GetPortStatus call into the roothub. The xHCI driver must time the port resume itself, and turn off resume signaling after a period of time. It does that by keeping track of the time to turn off resume signaling in an array called resume_done. It relies on the GetPortStatus call by the USB core to check if needs to clear the resume bit in the port status and clear the time in resume_done. When the USB core sees an error on the GetStatus control transfer to the HS hub, it starts a reset-resume, which will immediately set the reset bit in the roothub port status registers, without issuing a GetPortStatus. This causes the time in resume_done to linger, despite the fact that the device has been reset and is no longer resuming. This will cause the xHCI bus suspend functions to not allow the roothub to suspend, since a device is resuming: Jan 7 10:28:39 broadway kernel: [ 965.325547] hub 1-1:1.0: state 7 ports 4 chg 0000 evt 0000 Jan 7 10:28:41 broadway kernel: [ 967.824026] hub 1-1:1.0: hub_suspend Jan 7 10:28:41 broadway kernel: [ 967.840013] usb 1-1: usb auto-suspend Jan 7 10:28:43 broadway kernel: [ 969.202878] Port Status Change Event for port 3 Jan 7 10:28:43 broadway kernel: [ 969.202881] port resume event for port 3 Jan 7 10:28:43 broadway kernel: [ 969.202884] resume HS port 3 Jan 7 10:28:43 broadway kernel: [ 969.202899] hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0002 Jan 7 10:28:43 broadway kernel: [ 969.202904] get port status, actual port 0 status = 0x400fe3 Jan 7 10:28:43 broadway kernel: [ 969.216014] usb 1-1: usb wakeup-resume Jan 7 10:28:43 broadway kernel: [ 969.216019] get port status, actual port 0 status = 0xfe3 Jan 7 10:28:43 broadway kernel: [ 969.216022] usb 1-1: finish resume Jan 7 10:28:43 broadway kernel: [ 969.216380] xhci_hcd 0000:01:00.0: WARN: transfer error on endpoint Jan 7 10:28:43 broadway kernel: [ 969.216389] usb 1-1: retry with reset-resume Jan 7 10:28:43 broadway kernel: [ 969.266718] Port Status Change Event for port 3 Jan 7 10:28:43 broadway kernel: [ 969.272012] get port status, actual port 0 status = 0x200e03 Jan 7 10:28:43 broadway kernel: [ 969.328299] usb 1-1: reset high speed USB device using xhci_hcd and address 4 ... Jan 7 10:28:47 broadway kernel: [ 973.844013] hub 1-1:1.0: hub_suspend Jan 7 10:28:47 broadway kernel: [ 973.860015] usb 1-1: usb auto-suspend Jan 7 10:28:49 broadway kernel: [ 975.876023] hub 1-0:1.0: hub_suspend Jan 7 10:28:49 broadway kernel: [ 975.876030] usb usb1: bus auto-suspend Jan 7 10:28:49 broadway kernel: [ 975.876033] suspend failed because port 1 is resuming Jan 7 10:28:49 broadway kernel: [ 975.876035] usb usb1: bus suspend fail, err -16 Jan 7 10:28:49 broadway kernel: [ 975.876037] hub 1-0:1.0: hub_resume The fix is to unconditionally clear the time in resume_done (and the other associated bus state) if the USB core wants to set the port reset bit, and the high speed device is still suspended. USB 3.0 devices do not suffer from this issue, since their resume signaling is cleared automatically, and the xHCI driver does not have to time the resume. Signed-off-by: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> --- drivers/usb/host/xhci-hub.c | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index dec0b97..6edc84d 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -555,6 +555,21 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, xhci_writel(xhci, temp, port_array[wIndex]); temp = xhci_readl(xhci, port_array[wIndex]); + + /* HS reset-resume after a failed get status */ + if (!DEV_SUPERSPEED(temp) && + bus_state->resume_done[wIndex] != 0) { + bus_state->resume_done[wIndex] = 0; + slot_id = xhci_find_slot_id_by_port(hcd, xhci, + wIndex + 1); + if (!slot_id) { + xhci_dbg(xhci, "slot_id is zero\n"); + goto error; + } + xhci_ring_device(xhci, slot_id); + bus_state->port_c_suspend |= 1 << wIndex; + bus_state->suspended_ports &= ~(1 << wIndex); + } xhci_dbg(xhci, "set port reset, actual port %d status = 0x%x\n", wIndex, temp); break; default: -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html