On 25.01.2018 17:51, Alan Stern wrote:
On Thu, 25 Jan 2018, Samuel Sadok wrote:
2018-01-23 17:31 GMT+01:00 Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>:
On Tue, 23 Jan 2018, Samuel Sadok wrote:
Thanks Alan,
While I was at it I also enabled debug logs for xhci_hcd and xhci_pci.
$ echo module usbcore =p > /sys/kernel/debug/dynamic_debug/control
$ echo module xhci_hcd =p > /sys/kernel/debug/dynamic_debug/control
$ echo module xhci_pci =p > /sys/kernel/debug/dynamic_debug/control
$ modprobe usbmon
$ cat /sys/kernel/debug/usb/usbmon/0u > /tmp/usbmon.log
$ # press power button
dmesg: https://gist.github.com/anonymous/29b81574abf40605f999cfeefe98e341
usbmon: https://gist.github.com/anonymous/55b6d9bbf8b8c8627230b10d2b09dcb6
Both logs were collected at the same time so the timestamps should match.
In fact they do, quite closely. But there is a noticeable gap in the
usbmon trace, between lines 197 and 198 (the timestamp jumps from
35923531 to 38925126) and there obviously was a lot of activity on bus
1 in between.
Yes you're right, that's inconvenient because the "failed to disable
LTM" message occurs exactly in this gap. Are gaps like this expected /
known? Any ideas on how to mitigate them? (I kinda want to avoid
soldering an external bus monitor to the mainboard :) )
The text interface to usbmon uses a fixed-size buffer, and the size is
a little on the low side if you're dealing with large latencies (which
of course are unavoidable during a suspend/resume scenario). The size
is determined by EVENT_MAX, defined around line 40 in
drivers/usb/mon/mon_text.c:
#define EVENT_MAX (4*PAGE_SIZE / sizeof(struct mon_event_text))
Just change the 4 to something considerably larger and you should be in
good shape.
Resume starts at dmesg line 1255.
As far as I can judge, the earliest indication of something going
wrong is line 1514:
[ 36.087176] xhci_hcd 0000:00:14.0: xHCI xhci_urb_enqueue called with
unaddressed device
[ 36.087180] usb 2-4: Disable of device-initiated U1 failed.
[ 36.087212] xhci_hcd 0000:00:14.0: xHCI xhci_urb_enqueue called with
unaddressed device
[ 36.087224] usb 2-4: Disable of device-initiated U2 failed.
[ 36.087226] xhci_hcd 0000:00:14.0: xHCI xhci_urb_enqueue called with
unaddressed device
[ 36.087227] usb 2-4: usb_reset_and_verify_device Failed to disable LTM
Yes, that's where the problem first seems to show up. Apparently
usb_disable_ltm() fails because it can't communicate with the device,
and usb_reset_and_verify_device() then aborts because of that failure.
It shouldn't do this; after all, one very good reason for resetting the
device is because we're unable to communicate with it.
You could try editing the source code for usb_reset_and_verify_device()
in drivers/usb/core/hub.c, to see what happens if it doesn't give up
after usb_disable_ltm() fails.
For the following logs I modified the "Failed to disable LTM" message
and commented out the goto in the line below. Also I used kernel
version 4.15-rc9 this time. I omitted upgrading the out-of-tree
modules broadcom-wl and nouveau-fw, so don't mind errors related to
that.
The SD card reader is still missing after resume.
dmesg & usbmon:
https://gist.github.com/anonymous/fc597612d5ebbac40d7bef9f8f0b579a
`usb_reset_and_verify_device()` is executed around dmesg line 1524.
Again, the usbmon log has a major gap around that time.
It looks like the system is trying to carry out multiple warm resets,
or a single extremely long warm reset, when it really should give up
and do a cold reset instead.
It would be best to get Mathias's input on this. I don't know what is
the right way to fix this problem.
Alan Stern
Just back form a week long sickleave, sorry about the delay.
First glance it looks like we end up in compliance mode a lot during
those warm resets.
I need to look at this in more detail
-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html