On 29.01.2018 16:24, Mathias Nyman wrote:
On 25.01.2018 17:51, Alan Stern wrote:
On Thu, 25 Jan 2018, Samuel Sadok wrote:
2018-01-23 17:31 GMT+01:00 Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>:
On Tue, 23 Jan 2018, Samuel Sadok wrote:
Thanks Alan,
While I was at it I also enabled debug logs for xhci_hcd and xhci_pci.
$ echo module usbcore =p > /sys/kernel/debug/dynamic_debug/control
$ echo module xhci_hcd =p > /sys/kernel/debug/dynamic_debug/control
$ echo module xhci_pci =p > /sys/kernel/debug/dynamic_debug/control
$ modprobe usbmon
$ cat /sys/kernel/debug/usb/usbmon/0u > /tmp/usbmon.log
$ # press power button
dmesg: https://gist.github.com/anonymous/29b81574abf40605f999cfeefe98e341
usbmon: https://gist.github.com/anonymous/55b6d9bbf8b8c8627230b10d2b09dcb6
Both logs were collected at the same time so the timestamps should match.
In fact they do, quite closely. But there is a noticeable gap in the
usbmon trace, between lines 197 and 198 (the timestamp jumps from
35923531 to 38925126) and there obviously was a lot of activity on bus
1 in between.
Yes you're right, that's inconvenient because the "failed to disable
LTM" message occurs exactly in this gap. Are gaps like this expected /
known? Any ideas on how to mitigate them? (I kinda want to avoid
soldering an external bus monitor to the mainboard :) )
The text interface to usbmon uses a fixed-size buffer, and the size is
a little on the low side if you're dealing with large latencies (which
of course are unavoidable during a suspend/resume scenario). The size
is determined by EVENT_MAX, defined around line 40 in
drivers/usb/mon/mon_text.c:
#define EVENT_MAX (4*PAGE_SIZE / sizeof(struct mon_event_text))
Just change the 4 to something considerably larger and you should be in
good shape.
Resume starts at dmesg line 1255.
As far as I can judge, the earliest indication of something going
wrong is line 1514:
[ 36.087176] xhci_hcd 0000:00:14.0: xHCI xhci_urb_enqueue called with
unaddressed device
[ 36.087180] usb 2-4: Disable of device-initiated U1 failed.
[ 36.087212] xhci_hcd 0000:00:14.0: xHCI xhci_urb_enqueue called with
unaddressed device
[ 36.087224] usb 2-4: Disable of device-initiated U2 failed.
[ 36.087226] xhci_hcd 0000:00:14.0: xHCI xhci_urb_enqueue called with
unaddressed device
[ 36.087227] usb 2-4: usb_reset_and_verify_device Failed to disable LTM
Yes, that's where the problem first seems to show up. Apparently
usb_disable_ltm() fails because it can't communicate with the device,
and usb_reset_and_verify_device() then aborts because of that failure.
It shouldn't do this; after all, one very good reason for resetting the
device is because we're unable to communicate with it.
You could try editing the source code for usb_reset_and_verify_device()
in drivers/usb/core/hub.c, to see what happens if it doesn't give up
after usb_disable_ltm() fails.
For the following logs I modified the "Failed to disable LTM" message
and commented out the goto in the line below. Also I used kernel
version 4.15-rc9 this time. I omitted upgrading the out-of-tree
modules broadcom-wl and nouveau-fw, so don't mind errors related to
that.
The SD card reader is still missing after resume.
dmesg & usbmon:
https://gist.github.com/anonymous/fc597612d5ebbac40d7bef9f8f0b579a
`usb_reset_and_verify_device()` is executed around dmesg line 1524.
Again, the usbmon log has a major gap around that time.
It looks like the system is trying to carry out multiple warm resets,
or a single extremely long warm reset, when it really should give up
and do a cold reset instead.
It would be best to get Mathias's input on this. I don't know what is
the right way to fix this problem.
Alan Stern
Just back form a week long sickleave, sorry about the delay.
First glance it looks like we end up in compliance mode a lot during
those warm resets.
I need to look at this in more detail
Does reverting
37be6676 usb: hub: Fix auto-remount of safely removed or ejected USB-3 devices
help?
Reverting it sets the USB3 port to RxDetect via ss.disabled after failing
to reset a few times. kind of disabling/re-enabling the USB3 port.
Doing a USB2 type "cold" bus reset (SE0) instead of a USB3 inband Hot or Warm
reset could solve this as well, but I can't find a way to force a USB2 style cold
reset on USB3 ports with xHCI.
Setting the xhci port PORTSC register PR bit will issue a USB2 bus reset
for USB2 protocol ports and Hot reset for USB3 protocol ports.
The warm reset is a separate bit (WPR) in portsc register.
Other things that needs attention (todo list):
* hub_port_wait_reset() in hub.c checks for USB_PORT_STAT_RESET, returning -EBUSY
before checking USB_PORT_STAT_CONNECTION, returning -ENOTCONN. This causes extra
delays when doing several 200ms waits for warm reset to finish when the port should
just be logically disconnected.
* Fix the failing disable_lpm and disable_ltm issues in usb_reset_and_verify_device()
that Alan pointed out.
* make sure xhci sets udev->slot_id to zero when xhci slots are disabled.
Slots are disabled and freed when xHC is reset at resume in this case.
* check that root cause for failing USB3 device reset after resume is not that several
xhci slots are in Default state at the same time. xHC can't handle this.
In normal device enumeration usb core has a mutex protecting it, not sure it works here,
maybe usb core and xhci are out of sync after xHC reset?
-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html