Re: XHCI regression on v4.10-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20.01.2017 16:49, Patrice CHOTARD wrote:
On 01/13/2017 05:12 PM, Patrice Chotard wrote:
On 01/12/2017 06:17 PM, Mathias Nyman wrote:
On 09.01.2017 18:42, Mathias Nyman wrote:
On 09.01.2017 16:23, Patrice Chotard wrote:
On 01/09/2017 01:30 PM, Mathias Nyman wrote:

On 09.01.2017 11:51, Patrice Chotard wrote:
Hi Mathias, Greg

Hi


I am working on ARM STi platform, since v4.10-rc1, when booting B2260 or B2120 STi boards platform
with nothing plugged on USB3 connector, i observed the following kernel logs :

....
[  801.953836] usb usb6-port1: Cannot enable. Maybe the USB cable is bad?

That's odd, why does it try to reset the port if there's nothing there?

[  801.960455] xhci-hcd xhci-hcd.0.auto: Cannot set link state.

This makes sense, nothing is connected, and we try to set link to U3 (PORT_PE == 0)

[  801.966611] usb usb6-port1: cannot disable (err = -32)
xhci reutns -EPIPE as we try to set link state to U3 while port is not enabled (PORT_PE == 0)

[  806.083772] usb usb6-port1: Cannot enable. Maybe the USB cable is bad?

Again, about 5 seconds later the same port is reset, while nothing is connected.
Odd

[  806.090370] xhci-hcd xhci-hcd.0.auto: Cannot set link state.
[  806.096494] usb usb6-port1: cannot disable (err = -32)
[  810.208766] usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
[  810.215374] xhci-hcd xhci-hcd.0.auto: Cannot set link state.
[  810.221478] usb usb6-port1: cannot disable (err = -32)
[  814.333767] usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
[  814.340364] xhci-hcd xhci-hcd.0.auto: Cannot set link state.
.....

Another interesting thing is even if i plugged a device (a mass storage device in my case) on the USB3 connector,
the above logs continue to appear.

This is due to commit 37be66767e3ca "usb: hub: Fix auto-remount of safely removed or ejected USB-3 devices".

I don't know if STi platforms are the only impacted by this issue.

On v4.9 everything was ok.


Was there not a single "Cannot enable. Maybe USB cable is bad" message with 4.9?

There is no "Cannot enable. Maybe USB cable is bad" message with 4.9


Before commit 37be66767e3ca we forced the port to RxDetect state via Disabled state,
and cleared port change flags when a usb3_port_disable() was called.

after we just set the port to U3. Maybe those STi platforms depend on that cycle somehow?

I really don't know :-(


Could you take logs (dmesg) of both 4.9 and 4.10-rc1 with both usb core and xhci debugging enabled:

echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control


You will find below the 2 requested logs, in both case , no USB cable are plugged.

v4.9 dmesg logs :

.....
[   31.368022] hub 6-0:1.0: hub_resume
[   31.368064] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x400340
[   31.368071] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340
[   31.368224] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0000

Looks like port is stuck in Compliance mode, this should only happen if a connect was detected,
moving the port from RxDetect to Polling, and then timeout on polling to get to compliance.

4.9 is then stuck in a hub_suspend/resume loop with a port in compliance mode.


v4.10-rc1 dmesg logs :

....
[  269.436617] usb usb6-port1: cannot disable (err = -32)
[  269.464728] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
[  269.464756] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x340

I 4.10-rc1  we are again stuck in Compliance mode, but this time there was a event (00002) visible, maybe because
the flags were not cleared or some other reason. Now are stuck in a port reset loop with the
port in compliance mode.
So the real question is why is the port in compliance mode when there are no devices connected.?

I'll continue looking at this more tomorrow

So with both 4.9 and 4.10-rc the ports are in Compliance mode state, even if nothing is connected
To me this looks like the real underlying issue. For some reason it didn't try to reset the port in 4.9,
and never got to the loop of port reset.

In 4.9 the hub->events_bits[0] are not set, so the hub thread won't even ask for port status
on any ports.

In 4.10-rc the hub->events_bits[0] is set for port 1, It will start handling port events, see the
compliance state and try to warm reset the port, and fail as it never sees port

Any chance you could add extra debugging to xhci to show why the change bit is set?
Something like this:

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 0ef1690..e425d82 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1271,6 +1271,10 @@ int xhci_hub_status_data(struct usb_hcd *hcd, char *buf)
                             jiffies, bus_state->resume_done[i]))) {
                         buf[(i + 1) / 8] |= 1 << (i + 1) % 8;
                         status = 1;
+                       xhci_err(xhci, "portsc 0x%x, port_c_suspend:%d, resume_done:%d\n",
+                                temp,
+                                bus_state->port_c_suspend,
+                                bus_state->resume_done[i]);
                 }
                 if ((temp & PORT_RC))
                         reset_change = true;


and run with 4.9 and 4.10-rc

-Mathias


Hi Mathias

Thanks for your time

For information i replaced xhci_err() by xhci_dbg() to avoid flooding kernel log which
  avoid to complete the kernel boot.

Please find attached both kernel log files for v4.9 and v4.10-rc1.

In v4.9_xhci-hub_additionnal_debug.txt, at the beginning of the trace
portsc value is 0xa400340, then at [   18.366392] it becomes 0x400340 (link state change) and then 0x2802a0 and back again to 0xa400340

In v4.10-rc1_xhci-hub_additionnal_debug.txt, at the beginning of the trace
portsc value is 0xa400340, then at [    5.557976] it becomes 0x400340 (link state change), and then 0x2802a0 or 0x680340 for ever ....

Patrice


Hi Mathias

Did new logs give you some clue about the XHCi problem ?


Only that both cases, 4.9 and 4.10, start with port in compliance state in a
auto suspend/resume. At some point the hub thread will see an event for that
port and try to reset it. fail (as there is no device) and disable the port.


log snippet of busylooping between autosuspend and resume, similar for both 4.9 and
4.10:

[   18.329121] hub 6-0:1.0: hub_suspend
[   18.329175] usb usb6: bus auto-suspend, wakeup 1
[   18.329193] xhci-hcd xhci-hcd.0.auto: portsc 0xa400340, port_c_suspend:0, resume_done:0
[   18.329202] usb usb6: suspend raced with wakeup event
[   18.329209] usb usb6: usb auto-resume
[   18.332758] xhci-hcd xhci-hcd.0.auto: portsc 0xa400340, port_c_suspend:0, resume_done:0
[   18.344140] hub 6-0:1.0: hub_resume
[   18.344179] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x400340
[   18.344191] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340
[   18.344318] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0000
[   18.344368] xhci-hcd xhci-hcd.0.auto: set port remote wake mask, actual port 0 status  = 0xe400340
[   18.344451] hub 6-0:1.0: hub_suspend


In 4.9 we go back to the beginning, busylooping between auto suspend and resume,
until after some random time when hub sees an event again, and continues all over again.

log snippet of event, and resetting the port for 4.9:

[   18.367378] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
[   18.367417] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x400340
[   18.367426] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340
[   18.367805] usb usb6-port1: link state change
[   18.367846] xhci-hcd xhci-hcd.0.auto: clear port link state change, actual port 0 status  = 0x340
[   18.367943] usb usb6-port1: do warm reset
[   18.368043] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   18.428313] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x2b0
[   18.428330] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2b0
[   18.428475] usb usb6-port1: not warm reset yet, waiting 50ms
[   18.468135] xhci-hcd xhci-hcd.0.auto: Port Status Change Event for port 2
[   18.468155] xhci-hcd xhci-hcd.0.auto: handle_port_status: starting port polling.
[   18.468167] xhci-hcd xhci-hcd.0.auto: portsc 0x2802a0, port_c_suspend:0, resume_done:0
[   18.487840] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x2802e0
[   18.487855] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x3002e0
[   18.488636] xhci-hcd xhci-hcd.0.auto: clear port reset change, actual port 0 status  = 0x802e0
[   18.488736] xhci-hcd xhci-hcd.0.auto: clear port warm(BH) reset change, actual port 0 status  = 0x2e0
[   18.488803] xhci-hcd xhci-hcd.0.auto: clear port link state change, actual port 0 status  = 0x2e0
[   18.488869] xhci-hcd xhci-hcd.0.auto: clear port connect change, actual port 0 status  = 0x2e0
[   18.489012] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x2e0
[   18.489026] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2e0
[   18.489112] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x2e0
[   18.489122] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2e0
[   18.489182] xhci-hcd xhci-hcd.0.auto: Disable port 0
[   18.489239] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x280
[   18.489247] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x280
[   18.489296] xhci-hcd xhci-hcd.0.auto: Enable port 0
[   18.489357] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
[   18.489386] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x2a0
[   18.489395] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2a0
[   18.489489] xhci-hcd xhci-hcd.0.auto: set port remote wake mask, actual port 0 status  = 0xe0002a0
[   18.489547] hub 6-0:1.0: hub_suspend
[   18.489606] usb usb6: bus auto-suspend, wakeup 1
[   18.489624] xhci-hcd xhci-hcd.0.auto: xhci_hub_status_data: stopping port polling.
[   18.522772] xhci-hcd xhci-hcd.0.auto: xhci_hub_status_data: stopping port polling.
[   18.543748] Link is Up - 1000/Full
[   18.863014] xhci-hcd xhci-hcd.0.auto: Port Status Change Event for port 2
[   18.863033] xhci-hcd xhci-hcd.0.auto: resume root hub
[   18.863073] xhci-hcd xhci-hcd.0.auto: handle_port_status: starting port polling.
[   18.863089] xhci-hcd xhci-hcd.0.auto: portsc 0xa400340, port_c_suspend:0, resume_done:0
[   18.863333] usb usb6: usb wakeup-resume
[   18.863355] usb usb6: usb auto-resume
[   18.863379] hub 6-0:1.0: hub_resume
[   18.863410] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x400340
[   18.863419] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340

In 4.10 the port disable is a bit different, so after hub sees the first event (evt 0002) and tries to
reset the port, it will notice the reset fails, and try to reset port several timer, then disable device,
which will end up with another  event (evt 0002) and we start the whole reset loop again.

[    5.558058] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
[    5.558093] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x400340
[    5.558103] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340
[    5.558187] usb usb6-port1: link state change
[    5.558221] xhci-hcd xhci-hcd.0.auto: clear port link state change, actual port 0 status  = 0x340
[    5.558278] usb usb6-port1: do warm reset
[    5.558437] systemd-journald[108]: /dev/kmsg buffer overrun, some messages lost.
[    5.623028] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x2b0
[    5.623046] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2b0
[    5.623134] usb usb6-port1: not warm reset yet, waiting 50ms
[    5.658454] xhci-hcd xhci-hcd.0.auto: Port Status Change Event for port 2
[    5.658471] xhci-hcd xhci-hcd.0.auto: handle_port_status: starting port polling.
[    5.658487] xhci-hcd xhci-hcd.0.auto: portsc 0x2802a0, port_c_suspend:0, resume_done:0
[    5.683071] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x2802e0
[    5.683088] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x3002e0
[    5.683187] usb usb6-port1: not warm reset yet, waiting 200ms
[    5.777987] xhci-hcd xhci-hcd.0.auto: portsc 0x2802e0, port_c_suspend:0, resume_done:0
[    5.894043] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x2802e0
[    5.894061] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x3002e0
[    5.894158] usb usb6-port1: not warm reset yet, waiting 200ms
[    6.027992] xhci-hcd xhci-hcd.0.auto: portsc 0x2802e0, port_c_suspend:0, resume_done:0
[    6.103092] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x680340
[    6.103110] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x700340
[    6.103238] usb usb6-port1: not warm reset yet, waiting 200ms
[    6.277993] xhci-hcd xhci-hcd.0.auto: portsc 0x680340, port_c_suspend:0, resume_done:0
[    6.313066] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x680340
[    6.313085] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x700340
[    6.313349] usb usb6-port1: not warm reset yet, waiting 200ms
[    6.313385] xhci-hcd xhci-hcd.0.auto: clear port reset change, actual port 0 status  = 0x480340
[    6.313464] xhci-hcd xhci-hcd.0.auto: clear port warm(BH) reset change, actual port 0 status  = 0x400340
[    6.313523] xhci-hcd xhci-hcd.0.auto: clear port link state change, actual port 0 status  = 0x340
[    6.313587] xhci-hcd xhci-hcd.0.auto: clear port connect change, actual port 0 status  = 0x340
[    6.313651] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x340
[    6.313665] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x340
..loop a few times, until
[    9.673761] usb usb6-port1: not enabled, trying warm reset again...
[    9.673771] usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
[    9.680418] xhci-hcd xhci-hcd.0.auto: Cannot set link state.
[    9.722276] usb usb6-port1: cannot disable (err = -32)
[    9.754240] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
[    9.754291] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x340
[    9.754304] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x340
[    9.754419] usb usb6-port1: do warm reset
[    9.782987] xhci-hcd xhci-hcd.0.auto: xhci_hub_status_data: stopping port polling.
[    9.813087] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 status  = 0x2b0


We could try things like clearing all port change flags in ports disable, and/or only move ports
in xhci enabled state to U3 at port disable, from other states we would do like in 4.9, and
move to RxDetect via disable.

But this would not really solve the STi case, it would at best go back to 4.9 where the issue is
better hidden and silently stay in the autosuspend/resume busyloop with port in Compliance mode.

You really should focus on why the port link state is in compliance in the first place.

-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux