Re: xhci_hcd can't detect new devices after enabling runtime PM and disabling S3 wake (bug #85701)

"Lu, Baolu" <baolu.lu@xxxxxxxxxxxxxxx> · Sat, 11 Oct 2014 14:54:33 +0800

On 10/8/2014 11:52 PM, Alan Stern wrote:
On Wed, 8 Oct 2014, Mathias Nyman wrote:

Index: usb-3.17/drivers/usb/host/xhci-hub.c
===================================================================

--- usb-3.17.orig/drivers/usb/host/xhci-hub.c
+++ usb-3.17/drivers/usb/host/xhci-hub.c
@@ -1136,13 +1136,11 @@ int xhci_bus_suspend(struct usb_hcd *hcd
  			t2 |= PORT_LINK_STROBE | XDEV_U3;
  			set_bit(port_index, &bus_state->bus_suspended);
  		}
-		/* USB core sets remote wake mask for USB 3.0 hubs,
-		 * including the USB 3.0 roothub, but only if CONFIG_PM_RUNTIME
-		 * is enabled, so also enable remote wake here.
+		/* USB core sets remote wake mask for USB 3.0 hubs but
+		 * not for USB 2.0 hubs (and only if CONFIG_PM_RUNTIME
+		 * is enabled), so enable remote wake here.
  		 */
-		if (hcd->self.root_hub->do_remote_wakeup
-				&& device_may_wakeup(hcd->self.controller)) {
-
+		if (hcd->self.root_hub->do_remote_wakeup) {
  			if (t1 & PORT_CONNECT) {
  				t2 |= PORT_WKOC_E | PORT_WKDISC_E;
  				t2 &= ~PORT_WKCONN_E;
Hi

The device_may_wakeup() check was recently added in patch:

commit ff8cbf250b448aac35589f6075082c3fcad8a8fe
     xhci: clear root port wake on bits if controller isn't wake-up capable
The description of that commit doesn't agree with the title.  The
description says that some platforms get spurious wakeups, so the
wake-on bits should be cleared if do_wakeup is false.  It says nothing
about the controller not being wakeup-capable.

Also, the commit description says:

     When xHCI PCI host is suspended, if do_wakeup is false in xhci_pci_suspend,
     xhci_bus_suspend needs to clear all root port wake on bits...

This makes no sense, because xhci_bus_suspend() is not called from
xhci_pci_suspend().  They are two completely separate pathways.

http://marc.info/?l=linux-usb&m=140261803101225&w=2

Seems it caused some unexpected issues.
Evidently the commit was meant to fix a problem with system suspend,
and the author totally failed to realize that the code it touched also
gets used for runtime suspend.

I apologize for the wrong fix. I will follow up with a right fix.


Is there some way to fix both cases before ending up reverting that patch?
The commit altered the wrong routine.  If the controller isn't allowed
to wake up then the port wake-on bits should be cleared in
xhci_suspend(), not in xhci_bus_suspend().  In fact, it would be a good
idea to check in xhci_suspend() for a wakeup-in-progress as well.

Unfortunately, the curent code doesn't tell xhci_suspend() whether
wakeup should be enabled.  This can easily be fixed;
xhci_pci_suspend() should pass its do_wakeup argument on to
xhci_suspend().

Any idea why hcd_bus_suspend() doesn't see the suspend race when calling hcd->driver->hub_status_data() in any case?
I don't really understand what went wrong on Dmitry's system, or why
enabling the wake-on bits should make a difference.  I merely noticed
that xhci_bus_suspend() was doing the wrong thing, and it seemed like
it might be related to the bug.

Are the wake-on [dis]connect bits somehow tied to connect status change bits?
I assume they are.  Only the hardware designers know for sure.  :-)

When hcd_bus_suspend() checks for the suspend race condition it checks all xhci status change bits.
Shouldn't those indicate a connect change and prevent the second suspend?

Or maybe we should check for those bits resume as well?
Suspect something rather subtle must be going on.

xhci_hub_status_data() tests for resume-in-progress by checking
bus_state->resuming_ports.  That variable gets set by
handle_port_status() in response to a Port Status Change Event TRB.
Maybe under some circumstances the controller doesn't generate those
events unless the corresponding wake-on bit is set; I don't know.

What actually happened on Dmitry's system is a little strange.  When he
plugged in the device there was a wakeup request, and the
port-connect-status-change bit was set.  But the port-connect-status
bit was _not_ set, even after a 100 ms delay, so the system tried to
suspend the root hub again.  Without my patch, resuming_ports wasn't
set, and so the suspend succeeded and the connection ended up getting
lost.  With my patch, resuming_ports was set, and so the suspend was
aborted and the connection was eventually detected.

If I understand it correctly, Dmitry's issue should also happen if 
"disabled" is written to root hub's power/wakeup attribute in sysfs.

Thanks,
-baolu

Alan Stern


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html