Re: [RFT] usbcore: Bug fix: system can't suspend with USB3.0 device connected to USB3.0 hub

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 02, 2011 at 04:35:17AM -0400, CAI Qian wrote:
> Hi Sarah,
> 
> After applied this patch on the top of USB3 hub changes patches in my branch (can also reproduce the original problem), the problem is still existed.
> 
> Freezing user space processes ... (elapsed 0.01 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
> PM: Preallocating image memory... done (allocated 3924380 pages)
> PM: Allocated 15697520 kbytes in 2.53 seconds (6204.55 MB/s)
> hub 8-1:1.0: suspend error -16
> pm_op(): usb_dev_freeze+0x0/0x20 returns -16
> PM: Device 8-1 failed to freeze: error -16

So I think the 8-1 device is your roothub.  There's a couple different
places the code could return -EBUSY (which is what -16 is).

In xhci-hub.c, there's

        if (hcd->self.root_hub->do_remote_wakeup) {
                port_index = max_ports;
                while (port_index--) {
                        if (bus_state->resume_done[port_index] != 0) {
                                spin_unlock_irqrestore(&xhci->lock, flags);
                                xhci_dbg(xhci, "suspend failed because "
                                                "port %d is resuming\n",
                                                port_index + 1);
                                return -EBUSY;
                        }
                }
        }

There's also various places in the USB core on the device, bus, and host
controller suspend patch that can return -EBUSY.  Can you enable
CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING and give me the dmesg
output after you run `dmesg -n 8`?

That will allow me to see if the bus suspend function in the xHCI driver
is failing.  If so, that could indicate something is wrong with the
resume_done handling with the new split roothub code.

I'm wondering if you're running into an issue I had trouble reproducing.
I thought there might be an issue with resume_done in the xHCI code, but
Andiry Xu and Alan Stern convinced me my reasoning had to be wrong about
the cause of the bug, so I never included the bug fix patch in the final
patchset.  The patch is attached.  The patch only fixes the resume_done
issue for a high speed hub, but I could either modify the patch to work
for USB 3.0 hubs, or perhaps your resume_done issue is with the high
speed portion of your USB 3.0 hub.

If, after you enable xHCI debugging and try to hibernate the system, you
see a message from the xHCI driver about a particular port resuming,
then try to apply that patch and see if it allows you suspend.

> Restarting tasks ... done.
> 
> Unfortunately, the latest Linus tree had a regression that prevent hibernate/resume from working even without those patches.

CIA, what host controller are you running under?  What brand of hub are
you using?

Andiry, you said the simpler patched allowed your system to suspend and
hibernate.  Which host are you using?  Which hub?

> It is now failing to resume every time tried after hibernate.
> 
> PM: Starting manual resume from disk
> Freezing user space processes ... 
> EXT4-fs (dm-0): INFO: recovery required on readonly filesystem
> EXT4-fs (dm-0): write access will be enabled during recovery
> EXT4-fs (dm-0): recovery complete
> EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
> (elapsed 0.18 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> PM: Loading and decompressing image data (301765 pages) ... done
> PM: Read 1207060 kbytes in 6.47 seconds (186.56 MB/s)
> serial 00:08: disabled
> 
> After applied this patch and the rest of Andiry's larger USB 3.0 hub power management patches on the top of linus tree, there is a conflict with patch [5/5] here.
> 
> http://marc.info/?l=linux-usb&m=130136847804961&w=2
> 
> --- drivers/usb/core/hub.c
> +++ drivers/usb/core/hub.c
> @@ -2305,7 +2305,13 @@
>         }
> 
>         /* see 7.1.7.6 */
> -       status = set_port_feature(hub->hdev, port1, USB_PORT_FEAT_SUSPEND);
> +       if (hub_is_superspeed(hub->hdev))
> +               status = set_port_feature(hub->hdev,
> +                               port1 | (USB_SS_PORT_LS_U3 << 3),
> +                               USB_PORT_FEAT_LINK_STATE);
> +       else
> +               status = set_port_feature(hub->hdev, port1,
> +                                               USB_PORT_FEAT_SUSPEND);
>         if (status) {
>                 dev_dbg(hub->intfdev, "can't suspend port %d, status %d\n",
>                                 port1, status);
> 
> 
> Do you have suggestion on which git tree/branch is the best to test those?

Those patches should apply against the usb-next branch of Greg's tree:

git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6.git

However, he doesn't track Linus' tree very closely, so I don't think
that branch is up-to-date right now.  If you let me know when the other
resume regression is fixed, then I can create apply the patchset on top
of Linus' tree and create a branch for you to test.

Sarah Sharp
>From 0e3395514321b57ed185edc3fe75a39189bf738d Mon Sep 17 00:00:00 2001
From: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>
Date: Fri, 7 Jan 2011 11:13:00 -0800
Subject: [PATCH] xhci: Clear internal resume state on reset.

The setup that displays this issue is a HS mult-tt hub plugged into the
xHCI roothub.  Sometimes when I quickly plug in a device into the HS hub
right after the hub (but not the bus) is suspended, the hub will resume,
and then there will be a transfer error on the GetStatus for the HS hub.
This causes the USB core to start a reset-resume, without first checking
the port status of that port with a GetPortStatus call into the roothub.

The xHCI driver must time the port resume itself, and turn off resume
signaling after a period of time.  It does that by keeping track of the
time to turn off resume signaling in an array called resume_done.  It
relies on the GetPortStatus call by the USB core to check if needs to
clear the resume bit in the port status and clear the time in resume_done.
When the USB core sees an error on the GetStatus control transfer to the
HS hub, it starts a reset-resume, which will immediately set the reset bit
in the roothub port status registers, without issuing a GetPortStatus.

This causes the time in resume_done to linger, despite the fact that the
device has been reset and is no longer resuming.  This will cause the xHCI
bus suspend functions to not allow the roothub to suspend, since a device
is resuming:

Jan  7 10:28:39 broadway kernel: [  965.325547] hub 1-1:1.0: state 7 ports 4 chg 0000 evt 0000
Jan  7 10:28:41 broadway kernel: [  967.824026] hub 1-1:1.0: hub_suspend
Jan  7 10:28:41 broadway kernel: [  967.840013] usb 1-1: usb auto-suspend
Jan  7 10:28:43 broadway kernel: [  969.202878] Port Status Change Event for port 3
Jan  7 10:28:43 broadway kernel: [  969.202881] port resume event for port 3
Jan  7 10:28:43 broadway kernel: [  969.202884] resume HS port 3
Jan  7 10:28:43 broadway kernel: [  969.202899] hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0002
Jan  7 10:28:43 broadway kernel: [  969.202904] get port status, actual port 0 status  = 0x400fe3
Jan  7 10:28:43 broadway kernel: [  969.216014] usb 1-1: usb wakeup-resume
Jan  7 10:28:43 broadway kernel: [  969.216019] get port status, actual port 0 status  = 0xfe3
Jan  7 10:28:43 broadway kernel: [  969.216022] usb 1-1: finish resume
Jan  7 10:28:43 broadway kernel: [  969.216380] xhci_hcd 0000:01:00.0: WARN: transfer error on endpoint
Jan  7 10:28:43 broadway kernel: [  969.216389] usb 1-1: retry with reset-resume
Jan  7 10:28:43 broadway kernel: [  969.266718] Port Status Change Event for port 3
Jan  7 10:28:43 broadway kernel: [  969.272012] get port status, actual port 0 status  = 0x200e03
Jan  7 10:28:43 broadway kernel: [  969.328299] usb 1-1: reset high speed USB device using xhci_hcd and address 4
...
Jan  7 10:28:47 broadway kernel: [  973.844013] hub 1-1:1.0: hub_suspend
Jan  7 10:28:47 broadway kernel: [  973.860015] usb 1-1: usb auto-suspend
Jan  7 10:28:49 broadway kernel: [  975.876023] hub 1-0:1.0: hub_suspend
Jan  7 10:28:49 broadway kernel: [  975.876030] usb usb1: bus auto-suspend
Jan  7 10:28:49 broadway kernel: [  975.876033] suspend failed because port 1 is resuming
Jan  7 10:28:49 broadway kernel: [  975.876035] usb usb1: bus suspend fail, err -16
Jan  7 10:28:49 broadway kernel: [  975.876037] hub 1-0:1.0: hub_resume

The fix is to unconditionally clear the time in resume_done (and the other
associated bus state) if the USB core wants to set the port reset bit, and
the high speed device is still suspended.  USB 3.0 devices do not suffer
from this issue, since their resume signaling is cleared automatically,
and the xHCI driver does not have to time the resume.

Signed-off-by: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>
---
 drivers/usb/host/xhci-hub.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index dec0b97..6edc84d 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -555,6 +555,21 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue,
 			xhci_writel(xhci, temp, port_array[wIndex]);
 
 			temp = xhci_readl(xhci, port_array[wIndex]);
+
+			/* HS reset-resume after a failed get status */
+			if (!DEV_SUPERSPEED(temp) &&
+					bus_state->resume_done[wIndex] != 0) {
+				bus_state->resume_done[wIndex] = 0;
+				slot_id = xhci_find_slot_id_by_port(hcd, xhci,
+								 wIndex + 1);
+				if (!slot_id) {
+					xhci_dbg(xhci, "slot_id is zero\n");
+					goto error;
+				}
+				xhci_ring_device(xhci, slot_id);
+				bus_state->port_c_suspend |= 1 << wIndex;
+				bus_state->suspended_ports &= ~(1 << wIndex);
+			}
 			xhci_dbg(xhci, "set port reset, actual port %d status  = 0x%x\n", wIndex, temp);
 			break;
 		default:
-- 
1.7.1


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux