Re: XHCI is slow during boot (bios/efi) and leaves many dmesg messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09.05.2017 16:50, Maël Lavault wrote:
On Tue, 2017-05-09 at 11:34 +0300, Felipe Balbi wrote:
Hi,

Maël Lavault <mael.lavault@xxxxxxxxxxxxxxxxxxx> writes:
Le 28 avr. 2017 15:57, "Maël Lavault" <mael.lavault@universalavenue
.com
a écrit :

On Fri, 2017-04-28 at 16:58 +0300, Mathias Nyman wrote:
On 21.04.2017 11:08, Maël Lavault wrote:
On Tue, 2017-04-18 at 16:58 +0200, Maël Lavault wrote:
Hi,

I can't find how to reply to an old thread with majordomo,
sorry
for
the inconvenience it might cause.

I'm reposting an issue [0] that has been inactive for a few
month
but
still present in kernel 4.10.10 on a Macbook pro 12,1.

I can provide more informations if needed but the issue is
explained
in
details in the bugzilla issue.

Thanks.

---------------
Hi list,

I have a Apple Inc. MacBookPro11,1 (with the most recent
'bios':
BIOS
MBP111.88Z.0138.B16.1509081438 09/08/2015).

At the beginning, USB worked normally. After a while (and
after
newer
kernel versions released by debian?) things started to act
strangely.
For
one, the bios/efi boot takes a very long time (probably due
to
the
same
reason I describe later) just to get to the bootloader/grub.
Likley
resetting and probing for USB ports/mass storage. When grub
finally
pops
up, I can use the (internal USB based keyboard) normally to
select a
grub
entry etc.

Booting the kernel then works reasonably fine, until it loads
the
xhci
module.
It spews some messages in dmesg (taking some 15 seconds) and
only
then,
the
keyboard starts to work again.

The log is filled with messages like:
[    7.248479] xhci_hcd 0000:00:14.0: Command completion
event
does
not
match command
[    7.248495] xhci_hcd 0000:00:14.0: Timeout while waiting
for
setup
device command
[   12.256347] xhci_hcd 0000:00:14.0: Timeout while waiting
for
setup
device command
[   12.256363] usb 1-2: hub failed to enable device, error
-62
[   17.264166] xhci_hcd 0000:00:14.0: Timeout while waiting
for
setup
device command
(followed by USB hub/device enumeration)

I've tried several combinations and quirks, updating to the
latest rc
kernels since 3.16 (am on 4.5.0 right now) and it only seems
to
get
worse.

Last year, on the 3.x series of kernels occasionally after a
reboot
the
'bios' would go through quickly and fine and also no problems
loading
the
module and logging in. But now it always fails.

Additionally it (may or may not) seems to cause the internal
usb
card
reader to not even show up almost all of the time, though
under
OSX
it
works fine. There is/was a known issue with this cardreader
where
it
would
disappear after a suspend.

Adding various seemingly related intel usb3 quirks I had no
change,
as
I
think all of them are already applied to this chipset.

I'm guessing that somehow the usb chipset has some
configuration
option
miss-set (which persists over reboots/power down) and the
driver
doesn't
quite understand it.

Unfortunately it seems that this chipset does not work in
pure
USB2.0
(ehci) mode and needs the xhci module to work at all, so even
falling
to
USB2 is no option. Also disconnecting all USB perhipials is
nearly
impossible as the touchpad, bluetooth cardreader and keyboard
are
internally all wired to USB.

I'm attaching 3 dmesg logs with various kernels and levels of
debugging
information. I tried to google for errors from these logs,
but to
no
avail.

I have attached some log files on the bugzilla issue tracker
[1]
(they
are
to big for the ML I think).

[1] https://bugzilla.kernel.org/show_bug.cgi?id=115741


Olliver



[0] http://thread.gmane.org/gmane.linux.usb.general/139697

I updated to fedora 26 with kernel 4.11.0-rc7 and the issue is
still
present.

4.11-rc7 should have xhci tracepoints installed.
Could you take some traces at boot with most recent kernel you
got.

xhci traces for boot can be enabled by adding "trace_event=xhci-
hcd"
to
your kernel cmdline

Traces should then appear in /sys/kernel/debug/tracing/trace

-Mathias

Thanks ! I'll try to do that, but it's utterly difficult since each
keypress takes between 5 and 10 seconds due to this bug ...

So I tried it and I cannot access the trace. Are the trace
persisted
across reboot ? Since all 4.11 kernel freeze on the login page I
have
no other choice but rebooting on a old 4.10 kernel to look at the
trace.

You could SSH from a different machine, no? tracepoints don't persist
across reboots, AFAICT.


Thanks, it did work. Here is the trace:


      kworker/3:0-35    [003] d...     1.234029: xhci_queue_trb: CMD: Enable Slot Command: flags C
           <idle>-0     [002] d.h.     1.234042: xhci_handle_event: EVENT: TRB 000000025c47c000 status 'Success' len 0 slot 1 ep 0 type 'Command Completion Event' flags e:C
           <idle>-0     [002] d.h.     1.234043: xhci_handle_command: CMD: Enable Slot Command: flags C
      kworker/3:0-35    [003] ....     1.234058: xhci_alloc_virt_device: vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 0 speed 0 port 0 level 0 slot 0
           <idle>-0     [002] d.h.     1.239014: xhci_handle_event: EVENT: TRB 000000000e000000 status 'Success' len 0 slot 0 ep 0 type 'Port Status Change Event' flags e:C
      kworker/3:0-35    [003] ....     1.337988: xhci_setup_addressable_virt_device: vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3 level 1 slot 1
      kworker/3:0-35    [003] ....     1.337990: xhci_address_ctx: ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000
      kworker/3:0-35    [003] d...     1.337990: xhci_setup_device: vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3 level 1 slot 1
      kworker/3:0-35    [003] d...     1.337992: xhci_queue_trb: CMD: Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
           <idle>-0     [001] d.h.     6.633956: xhci_handle_event: EVENT: TRB 000000025c47c010 status 'Command Aborted' len 0 slot 1 ep 0 type 'Command Completion Event' flags e:C
           <idle>-0     [001] d.h.     6.633959: xhci_handle_command: CMD: Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
           <idle>-0     [001] d.h.     6.633964: xhci_handle_event: EVENT: TRB 000000025c47c020 status 'Command Ring Stopped' len 0 slot 1 ep 0 type 'Command Completion Event' flags e:C
           <idle>-0     [001] d.h.     6.633964: xhci_handle_command: CMD: type 'UNKNOWN' -> raw 00000000 00000000 00000000 00000000
      kworker/3:2-163   [003] d...     6.634066: xhci_queue_trb: CMD: Enable Slot Command: flags C
      kworker/3:0-35    [003] ....     8.682073: xhci_address_ctx: ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000
      kworker/3:0-35    [003] d...     8.682075: xhci_setup_device: vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3 level 1 slot 1
      kworker/3:0-35    [003] d...     8.682079: xhci_queue_trb: CMD: Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
           <idle>-0     [001] d.h.    13.801929: xhci_handle_event: EVENT: TRB 000000025c47c020 status 'Success' len 0 slot 2 ep 0 type 'Command Completion Event' flags e:C
           <idle>-0     [001] d.h.    13.801932: xhci_handle_command: CMD: Enable Slot Command: flags C
           <idle>-0     [001] dNh.    13.802010: xhci_handle_event: EVENT: TRB 000000025c47c030 status 'Success' len 0 slot 0 ep 0 type 'Command Completion Event' flags e:C
           <idle>-0     [001] dNh.    13.802010: xhci_handle_command: CMD: Enable Slot Command: flags C


Ok, I think I got something that could help, any chance you could test the following change:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index d9936c7..5676b43 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1342,7 +1342,7 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
        cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));
/* If CMD ring stopped we own the trbs between enqueue and dequeue */
-       if (cmd_comp_code == COMP_STOPPED) {
+       if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
                complete_all(&xhci->cmd_ring_stop_completion);
                return;
        }



Details: (mostly for myself to parse the trace)

The connected usb device fails to respond to the Address device (set address) command.
xhci driver times out after 5 seconds, and abort the Address device command.
we get a 'Command aborted' command completion event followed by a
'Command ring stopped' command completion event.
The Abort command completion event increases the command dequeue pointer past the
'Address device' command to the next command,
which is now yet queued, hence 'UNKNOWN' -> raw 0 type of command for the stopped command event.

Dmesg shows "Command completion event does not match command" even if the ring pointers match,
except for the 'Command Ring Stopped' event which we get the "UNKNOWN" command for. but for that
event code should complete pending work and bail out before the message is written.. except it doesn't,
because it checks for the wrong type of stop event, it checks for transfer ring stop instead of
command ring stop.

-Mathias

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux