Re: XHCI is slow during boot (bios/efi) and leaves many dmesg messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16.05.2017 13:00, Maël Lavault wrote:
On Wed, 2017-05-10 at 16:04 +0300, Mathias Nyman wrote:
On 09.05.2017 16:50, Maël Lavault wrote:
On Tue, 2017-05-09 at 11:34 +0300, Felipe Balbi wrote:
Hi,

Maël Lavault <mael.lavault@xxxxxxxxxxxxxxxxxxx> writes:
Le 28 avr. 2017 15:57, "Maël Lavault" <mael.lavault@universalav
enue
.com
a écrit :

On Fri, 2017-04-28 at 16:58 +0300, Mathias Nyman wrote:
On 21.04.2017 11:08, Maël Lavault wrote:
On Tue, 2017-04-18 at 16:58 +0200, Maël Lavault wrote:
Hi,

I can't find how to reply to an old thread with
majordomo,
sorry
for
the inconvenience it might cause.

I'm reposting an issue [0] that has been inactive for a
few
month
but
still present in kernel 4.10.10 on a Macbook pro 12,1.

I can provide more informations if needed but the issue
is
explained
in
details in the bugzilla issue.

Thanks.

---------------
Hi list,

I have a Apple Inc. MacBookPro11,1 (with the most recent
'bios':
BIOS
MBP111.88Z.0138.B16.1509081438 09/08/2015).

At the beginning, USB worked normally. After a while (and
after
newer
kernel versions released by debian?) things started to
act
strangely.
For
one, the bios/efi boot takes a very long time (probably
due
to
the
same
reason I describe later) just to get to the
bootloader/grub.
Likley
resetting and probing for USB ports/mass storage. When
grub
finally
pops
up, I can use the (internal USB based keyboard) normally
to
select a
grub
entry etc.

Booting the kernel then works reasonably fine, until it
loads
the
xhci
module.
It spews some messages in dmesg (taking some 15 seconds)
and
only
then,
the
keyboard starts to work again.

The log is filled with messages like:
[    7.248479] xhci_hcd 0000:00:14.0: Command completion
event
does
not
match command
[    7.248495] xhci_hcd 0000:00:14.0: Timeout while
waiting
for
setup
device command
[   12.256347] xhci_hcd 0000:00:14.0: Timeout while
waiting
for
setup
device command
[   12.256363] usb 1-2: hub failed to enable device,
error
-62
[   17.264166] xhci_hcd 0000:00:14.0: Timeout while
waiting
for
setup
device command
(followed by USB hub/device enumeration)

I've tried several combinations and quirks, updating to
the
latest rc
kernels since 3.16 (am on 4.5.0 right now) and it only
seems
to
get
worse.

Last year, on the 3.x series of kernels occasionally
after a
reboot
the
'bios' would go through quickly and fine and also no
problems
loading
the
module and logging in. But now it always fails.

Additionally it (may or may not) seems to cause the
internal
usb
card
reader to not even show up almost all of the time, though
under
OSX
it
works fine. There is/was a known issue with this
cardreader
where
it
would
disappear after a suspend.

Adding various seemingly related intel usb3 quirks I had
no
change,
as
I
think all of them are already applied to this chipset.

I'm guessing that somehow the usb chipset has some
configuration
option
miss-set (which persists over reboots/power down) and the
driver
doesn't
quite understand it.

Unfortunately it seems that this chipset does not work in
pure
USB2.0
(ehci) mode and needs the xhci module to work at all, so
even
falling
to
USB2 is no option. Also disconnecting all USB perhipials
is
nearly
impossible as the touchpad, bluetooth cardreader and
keyboard
are
internally all wired to USB.

I'm attaching 3 dmesg logs with various kernels and
levels of
debugging
information. I tried to google for errors from these
logs,
but to
no
avail.

I have attached some log files on the bugzilla issue
tracker
[1]
(they
are
to big for the ML I think).

[1] https://bugzilla.kernel.org/show_bug.cgi?id=115741


Olliver



[0] http://thread.gmane.org/gmane.linux.usb.general/13969
7

I updated to fedora 26 with kernel 4.11.0-rc7 and the issue
is
still
present.

4.11-rc7 should have xhci tracepoints installed.
Could you take some traces at boot with most recent kernel
you
got.

xhci traces for boot can be enabled by adding
"trace_event=xhci-
hcd"
to
your kernel cmdline

Traces should then appear in /sys/kernel/debug/tracing/trace

-Mathias

Thanks ! I'll try to do that, but it's utterly difficult since
each
keypress takes between 5 and 10 seconds due to this bug ...

So I tried it and I cannot access the trace. Are the trace
persisted
across reboot ? Since all 4.11 kernel freeze on the login page
I
have
no other choice but rebooting on a old 4.10 kernel to look at
the
trace.

You could SSH from a different machine, no? tracepoints don't
persist
across reboots, AFAICT.


Thanks, it did work. Here is the trace:


       kworker/3:0-35    [003] d...     1.234029: xhci_queue_trb:
CMD: Enable Slot Command: flags C
            <idle>-0     [002] d.h.     1.234042: xhci_handle_event:
EVENT: TRB 000000025c47c000 status 'Success' len 0 slot 1 ep 0 type
'Command Completion Event' flags e:C
            <idle>-0     [002] d.h.     1.234043:
xhci_handle_command: CMD: Enable Slot Command: flags C
       kworker/3:0-35    [003] ....     1.234058:
xhci_alloc_virt_device: vdev ffff9107a1ee4000 ctx 25c5cd000 |
25c5cc000 num 0 state 0 speed 0 port 0 level 0 slot 0
            <idle>-0     [002] d.h.     1.239014: xhci_handle_event:
EVENT: TRB 000000000e000000 status 'Success' len 0 slot 0 ep 0 type
'Port Status Change Event' flags e:C
       kworker/3:0-35    [003] ....     1.337988:
xhci_setup_addressable_virt_device: vdev ffff9107a1ee4000 ctx
25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3 level 1 slot 1
       kworker/3:0-35    [003] ....     1.337990: xhci_address_ctx:
ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000
       kworker/3:0-35    [003] d...     1.337990: xhci_setup_device:
vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed
5 port 3 level 1 slot 1
       kworker/3:0-35    [003] d...     1.337992: xhci_queue_trb:
CMD: Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
            <idle>-0     [001] d.h.     6.633956: xhci_handle_event:
EVENT: TRB 000000025c47c010 status 'Command Aborted' len 0 slot 1
ep 0 type 'Command Completion Event' flags e:C
            <idle>-0     [001] d.h.     6.633959:
xhci_handle_command: CMD: Address Device Command: ctx
000000025c5cd000 slot 1 flags b:C
            <idle>-0     [001] d.h.     6.633964: xhci_handle_event:
EVENT: TRB 000000025c47c020 status 'Command Ring Stopped' len 0
slot 1 ep 0 type 'Command Completion Event' flags e:C
            <idle>-0     [001] d.h.     6.633964:
xhci_handle_command: CMD: type 'UNKNOWN' -> raw 00000000 00000000
00000000 00000000
       kworker/3:2-163   [003] d...     6.634066: xhci_queue_trb:
CMD: Enable Slot Command: flags C
       kworker/3:0-35    [003] ....     8.682073: xhci_address_ctx:
ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000
       kworker/3:0-35    [003] d...     8.682075: xhci_setup_device:
vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed
5 port 3 level 1 slot 1
       kworker/3:0-35    [003] d...     8.682079: xhci_queue_trb:
CMD: Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
            <idle>-0     [001] d.h.    13.801929: xhci_handle_event:
EVENT: TRB 000000025c47c020 status 'Success' len 0 slot 2 ep 0 type
'Command Completion Event' flags e:C
            <idle>-0     [001] d.h.    13.801932:
xhci_handle_command: CMD: Enable Slot Command: flags C
            <idle>-0     [001] dNh.    13.802010: xhci_handle_event:
EVENT: TRB 000000025c47c030 status 'Success' len 0 slot 0 ep 0 type
'Command Completion Event' flags e:C
            <idle>-0     [001] dNh.    13.802010:
xhci_handle_command: CMD: Enable Slot Command: flags C


Ok, I think I got something that could help, any chance you could
test the following change:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-
ring.c
index d9936c7..5676b43 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1342,7 +1342,7 @@ static void handle_cmd_completion(struct
xhci_hcd *xhci,
          cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));

          /* If CMD ring stopped we own the trbs between enqueue and
dequeue */
-       if (cmd_comp_code == COMP_STOPPED) {
+       if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
                  complete_all(&xhci->cmd_ring_stop_completion);
                  return;
          }



Details: (mostly for myself to parse the trace)

The connected usb device fails to respond to the Address device (set
address) command.
xhci driver times out after 5 seconds, and abort the Address device
command.
we get a 'Command aborted' command completion event followed by a
'Command ring stopped' command completion event.
The Abort command completion event increases the command dequeue
pointer past the
'Address device' command to the next command,
which is now yet queued, hence 'UNKNOWN' -> raw 0 type of command for
the stopped command event.

Dmesg shows "Command completion event does not match command" even if
the ring pointers match,
except for the 'Command Ring Stopped' event which we get the
"UNKNOWN" command for. but for that
event code should complete pending work and bail out before the
message is written.. except it doesn't,
because it checks for the wrong type of stop event, it checks for
transfer ring stop instead of
command ring stop.

-Mathias





Ok, so I managed to compile the xhci-hcd kernel module with the patch,
then I gzipped it. But the problem is my kernel has the xhci-hcd as
builtin driver and I don't know how to override it with my module.
Adding it to the kernel module directory seems to have no effect.

Any tips ?

Thanks


I set up a branch for this issues
Probably easier to pull that whole branch and compile and install it.

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git xhci-macbook

Alex Henrie pointed out in another thread there are several people suffering from this.
Alex tried that branch, which contains the fix, but it didn't help.

There are a few more things that could be tried that are not implemented. The spec
have have additional notes about disabling and re-enabling the xhci slot before retrying
to address a device that timed out on address device command.

I'll let you both know when something new is ready and can be tested

Thanks
-Mathias


Xhci spec have additional notes about ut.

I will implement
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux