Re: Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22.04.2018 09:29, russianneuromancer@xxxxx wrote:
Hello!

So far I tested attached patch but didn't tried to revert commit yet,
will do next week.

Result of running patched kernel with recommended debug options:
https://paste.fedoraproject.org/paste/UpezexD~tDmQthoxV2BFbg


Logs show there is a race, controller is suspended, then resumed,
but no interrupt is pending in xhci_resume so roothubs are not resumed,
and host starts to suspend again.

We get the interrupt only after we already started suspending xhci
controller again.

My guess is that when we handle the interrupt we queue work to resume the roothub,
but controller is probably put to D3 suspended state by then,
returning 0xffffffff from some register reads, which driver understands as a dead host.

I need to look into this a bit more.

[  268.144527] xhci_hcd 0000:00:14.0: xhci_suspend: stopping port polling.
[  268.144543] xhci_hcd 0000:00:14.0: // Setting command ring address to 0x349bd001
[  268.520802] xhci_hcd 0000:00:14.0: // Setting command ring address to 0x349bd001
[  268.520969] xhci_hcd 0000:00:14.0: xhci_resume: starting port polling.
[  268.520985] xhci_hcd 0000:00:14.0: xhci_hub_status_data: stopping port polling.
[  268.521030] xhci_hcd 0000:00:14.0: xhci_suspend: stopping port polling.
[  268.521040] xhci_hcd 0000:00:14.0: // Setting command ring address to 0x349bd001
[  268.521139] xhci_hcd 0000:00:14.0: Port Status Change Event for port 3
[  268.521149] xhci_hcd 0000:00:14.0: resume root hub
[  268.521163] xhci_hcd 0000:00:14.0: port resume event for port 3
[  268.521168] xhci_hcd 0000:00:14.0: xHC is not running.
[  268.521174] xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[  268.596322] xhci_hcd 0000:00:14.0: xhci_hc_died: xHCI host controller not responding, assume dead
[  268.596340] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 1, ep index 0

-Mathias

16/04/2018 14:55 +0300, Mathias Nyman:
On 10.04.2018 12:15, russianneuromancer@xxxxx wrote:
Hello!

On Dell Venue 8 Pro 5855 tablet installing tlp or running "powertop
--
auto-tune" cause "xHCI host controller not responding, assume dead"
error, when error happen two integrated USB devices (Bluetooth
adapter
and LTE modem) disappear until reboot. First time this issue was
observer in Linux 4.13 and still present in Linux 4.16.
Blacklisting
both "Linux Foundation 3.0 root hub" from autosuspend in tlp
configuration is workaround for this issue, however on other
devices
tlp works fine without blacklisting usb hub autosuspend, and on
this
tablet there was no such issue before (at least in Linux ~4.8-4.12
range) so I assume there is regression somewhere.

Is there any related commits between 4.12 and 4.13 that I could try
to revert?


In 4.12 there was a added sensitivity to react to hotplug removed
xhc controllers, i.e. if we read 0xffffffff from a xhci register
we assume host is removed and start cleaning up.

commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
      xhci: Rework how we handle unresponsive or hoptlug removed hosts

You can try to revert that, but as a final solution we should
find the real rootcause

How issue looks like in logs:

[  227.258385] xhci_hcd 0000:00:14.0: xHC is not running.
[  329.671544] xhci_hcd 0000:00:14.0: xHC is not running.
[  416.695796] xhci_hcd 0000:00:14.0: xHC is not running.

The "xHC is not running" is the xhci driver handing a port event
interrupt for a resuming port, but whole host controller is not
running.
We stop the host controller in xhci_suspend(), and start it in
xhci_resume()

Attaching a patch that improves preventing xhci host suspend during
USB2 resume signaling.
Could help, worth a shot.

[  416.695862] xhci_hcd 0000:00:14.0: xHCI host controller not
responding, assume dead

This means xhci_hc_died() was called, many possible places.
Adding the code below could give a hint:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-
ring.c
index daa94c3..51fb3d0 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -900,7 +900,8 @@ void xhci_hc_died(struct xhci_hcd *xhci)
          if (xhci->xhc_state & XHCI_STATE_DYING)
                  return;
- xhci_err(xhci, "xHCI host controller not responding, assume
dead\n");
+       xhci_err(xhci, "%ps: xHCI host controller not responding,
assume dead\n",
+                __builtin_return_address(0));
          xhci->xhc_state |= XHCI_STATE_DYING;
xhci_cleanup_command_queue(xhci);

[  416.695900] xhci_hcd 0000:00:14.0: HC died; cleaning up
[  416.696052] usb 1-3: USB disconnect, device number 2
[  416.815610] cdc_mbim 1-3:1.12 wwp0s20u3i12: unregister
'cdc_mbim'
usb-0000:00:14.0-3, CDC MBIM
[  416.847934] usb 1-4: USB disconnect, device number 3

After that Bluetooth adapter and LTE modem disappear from lsusb
output,
while xHCI controller itself remain visible.

we stop the host activity in xhci_hc_died(), no usb devices under
this host will work.

Complete dmesg: https://paste.fedoraproject.org/paste/7aMpVGLfZ82zp
pdGs
56Oqg
lsusb -v: https://paste.fedoraproject.org/paste/c7y8GisC13YdzcYE9B-
JIw
dsdt.dsl: https://paste.fedoraproject.org/paste/8g6mp2dafypUkFT4sa4
3iA

xhci traces and dynamic debug could help:

mount -t debugfs none /sys/kernel/debug
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

echo -n 'module xhci_hcd =p' >
/sys/kernel/debug/dynamic_debug/control

-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux