Re: NEC uPD720200 xHCI Controller dies when Runtime PM enabled

Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> · Mon, 08 Feb 2016 16:31:03 +0200




Hi

On 06.02.2016 19:08, Mike Murdoch wrote:
Bug ID: 111251

Hello,

I have a NEC uPD720200 USB3.0 controller in a Thinkpad W520 laptop on
kernel 4.4.1-gentoo.

0e:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host
Controller (rev 04) (prog-if 30 [XHCI])
     Subsystem: Lenovo uPD720200 USB 3.0 Host Controller

When runtime power control for this controller is disabled
(/sys/bus/pci/devices/0000:0e:00.0/power/control = on), the controller
works fine and reaches over 120MB/s transfer rates.

When runtime power control for this controller is enabled
(/sys/bus/pci/devices/0000:0e:00.0/power/control = auto), two effects
can be observed:

- Transfer rates are much lower at around 30MB/s
- During transfers, the controller dies after a couple of seconds:

xhci_hcd 0000:0e:00.0: xHCI host not responding to stop endpoint command.
xhci_hcd 0000:0e:00.0: Assuming host is dying, halting host.
xhci_hcd 0000:0e:00.0: Host not halted after 16000 microseconds.
xhci_hcd 0000:0e:00.0: Non-responsive xHCI host is not halting.
xhci_hcd 0000:0e:00.0: Completing active URBs anyway.
xhci_hcd 0000:0e:00.0: HC died; cleaning up
sd 9:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_ERROR
driverbyte=DRIVER_OK
sd 9:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 19 a9 00 00 00 f0 00
blk_update_request: I/O error, dev sdc, sector 1681664
xhci_hcd 0000:0e:00.0: Stopped the command ring failed, maybe the host
is dead
xhci_hcd 0000:0e:00.0: Host not halted after 16000 microseconds.
xhci_hcd 0000:0e:00.0: Abort command ring failed
xhci_hcd 0000:0e:00.0: HC died; cleaning up

At this point, a reboot is required to reactivate the controller,
unloading and reloading the xhci_* modules does not work.


With 120MB/s I assume it was a USB3 device.
Was there any USB 2 device connected as well?
Does this occur with only a USB2 device connected to xhci?

xhci handles suspend/resume a bit differently for USB2 and USB3 roothubs.

Does this happen on older kernels as well? 4.3 or 4.2 based?

For more xhci debugging, do:
echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
and check dmesg for more xhci info.

If reloading the module did not help it is more likely that the controller is in some
unexpected state.
If however, it would instead be just bad timeout timer handling we could just return immediately
in the timeout handler, and check if the usb device(s) continue to work normally.

This could be done by editing drivers/usb/hosts/xhci-ring.c

+++ b/drivers/usb/host/xhci-ring.c
@@ -831,6 +831,7 @@ void xhci_stop_endpoint_command_watchdog(unsigned long arg)
        struct xhci_virt_ep *ep;
        int ret, i, j;
        unsigned long flags;
+       return;

-Mathias

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html