Re: XHCI is slow during boot (bios/efi) and leaves many dmesg messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2017-05-10 at 16:04 +0300, Mathias Nyman wrote:
> On 09.05.2017 16:50, Maël Lavault wrote:
> > On Tue, 2017-05-09 at 11:34 +0300, Felipe Balbi wrote:
> > > Hi,
> > > 
> > > Maël Lavault <mael.lavault@xxxxxxxxxxxxxxxxxxx> writes:
> > > > Le 28 avr. 2017 15:57, "Maël Lavault" <mael.lavault@universalav
> > > > enue
> > > > .com
> > > > > a écrit :
> > > > 
> > > > On Fri, 2017-04-28 at 16:58 +0300, Mathias Nyman wrote:
> > > > > On 21.04.2017 11:08, Maël Lavault wrote:
> > > > > > On Tue, 2017-04-18 at 16:58 +0200, Maël Lavault wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > I can't find how to reply to an old thread with
> > > > > > > majordomo,
> > > > > > > sorry
> > > > > > > for
> > > > > > > the inconvenience it might cause.
> > > > > > > 
> > > > > > > I'm reposting an issue [0] that has been inactive for a
> > > > > > > few
> > > > > > > month
> > > > > > > but
> > > > > > > still present in kernel 4.10.10 on a Macbook pro 12,1.
> > > > > > > 
> > > > > > > I can provide more informations if needed but the issue
> > > > > > > is
> > > > > > > explained
> > > > > > > in
> > > > > > > details in the bugzilla issue.
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > > ---------------
> > > > > > > Hi list,
> > > > > > > 
> > > > > > > I have a Apple Inc. MacBookPro11,1 (with the most recent
> > > > > > > 'bios':
> > > > > > > BIOS
> > > > > > > MBP111.88Z.0138.B16.1509081438 09/08/2015).
> > > > > > > 
> > > > > > > At the beginning, USB worked normally. After a while (and
> > > > > > > after
> > > > > > > newer
> > > > > > > kernel versions released by debian?) things started to
> > > > > > > act
> > > > > > > strangely.
> > > > > > > For
> > > > > > > one, the bios/efi boot takes a very long time (probably
> > > > > > > due
> > > > > > > to
> > > > > > > the
> > > > > > > same
> > > > > > > reason I describe later) just to get to the
> > > > > > > bootloader/grub.
> > > > > > > Likley
> > > > > > > resetting and probing for USB ports/mass storage. When
> > > > > > > grub
> > > > > > > finally
> > > > > > > pops
> > > > > > > up, I can use the (internal USB based keyboard) normally
> > > > > > > to
> > > > > > > select a
> > > > > > > grub
> > > > > > > entry etc.
> > > > > > > 
> > > > > > > Booting the kernel then works reasonably fine, until it
> > > > > > > loads
> > > > > > > the
> > > > > > > xhci
> > > > > > > module.
> > > > > > > It spews some messages in dmesg (taking some 15 seconds)
> > > > > > > and
> > > > > > > only
> > > > > > > then,
> > > > > > > the
> > > > > > > keyboard starts to work again.
> > > > > > > 
> > > > > > > The log is filled with messages like:
> > > > > > > [    7.248479] xhci_hcd 0000:00:14.0: Command completion
> > > > > > > event
> > > > > > > does
> > > > > > > not
> > > > > > > match command
> > > > > > > [    7.248495] xhci_hcd 0000:00:14.0: Timeout while
> > > > > > > waiting
> > > > > > > for
> > > > > > > setup
> > > > > > > device command
> > > > > > > [   12.256347] xhci_hcd 0000:00:14.0: Timeout while
> > > > > > > waiting
> > > > > > > for
> > > > > > > setup
> > > > > > > device command
> > > > > > > [   12.256363] usb 1-2: hub failed to enable device,
> > > > > > > error
> > > > > > > -62
> > > > > > > [   17.264166] xhci_hcd 0000:00:14.0: Timeout while
> > > > > > > waiting
> > > > > > > for
> > > > > > > setup
> > > > > > > device command
> > > > > > > (followed by USB hub/device enumeration)
> > > > > > > 
> > > > > > > I've tried several combinations and quirks, updating to
> > > > > > > the
> > > > > > > latest rc
> > > > > > > kernels since 3.16 (am on 4.5.0 right now) and it only
> > > > > > > seems
> > > > > > > to
> > > > > > > get
> > > > > > > worse.
> > > > > > > 
> > > > > > > Last year, on the 3.x series of kernels occasionally
> > > > > > > after a
> > > > > > > reboot
> > > > > > > the
> > > > > > > 'bios' would go through quickly and fine and also no
> > > > > > > problems
> > > > > > > loading
> > > > > > > the
> > > > > > > module and logging in. But now it always fails.
> > > > > > > 
> > > > > > > Additionally it (may or may not) seems to cause the
> > > > > > > internal
> > > > > > > usb
> > > > > > > card
> > > > > > > reader to not even show up almost all of the time, though
> > > > > > > under
> > > > > > > OSX
> > > > > > > it
> > > > > > > works fine. There is/was a known issue with this
> > > > > > > cardreader
> > > > > > > where
> > > > > > > it
> > > > > > > would
> > > > > > > disappear after a suspend.
> > > > > > > 
> > > > > > > Adding various seemingly related intel usb3 quirks I had
> > > > > > > no
> > > > > > > change,
> > > > > > > as
> > > > > > > I
> > > > > > > think all of them are already applied to this chipset.
> > > > > > > 
> > > > > > > I'm guessing that somehow the usb chipset has some
> > > > > > > configuration
> > > > > > > option
> > > > > > > miss-set (which persists over reboots/power down) and the
> > > > > > > driver
> > > > > > > doesn't
> > > > > > > quite understand it.
> > > > > > > 
> > > > > > > Unfortunately it seems that this chipset does not work in
> > > > > > > pure
> > > > > > > USB2.0
> > > > > > > (ehci) mode and needs the xhci module to work at all, so
> > > > > > > even
> > > > > > > falling
> > > > > > > to
> > > > > > > USB2 is no option. Also disconnecting all USB perhipials
> > > > > > > is
> > > > > > > nearly
> > > > > > > impossible as the touchpad, bluetooth cardreader and
> > > > > > > keyboard
> > > > > > > are
> > > > > > > internally all wired to USB.
> > > > > > > 
> > > > > > > I'm attaching 3 dmesg logs with various kernels and
> > > > > > > levels of
> > > > > > > debugging
> > > > > > > information. I tried to google for errors from these
> > > > > > > logs,
> > > > > > > but to
> > > > > > > no
> > > > > > > avail.
> > > > > > > 
> > > > > > > I have attached some log files on the bugzilla issue
> > > > > > > tracker
> > > > > > > [1]
> > > > > > > (they
> > > > > > > are
> > > > > > > to big for the ML I think).
> > > > > > > 
> > > > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=115741
> > > > > > > 
> > > > > > > 
> > > > > > > Olliver
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > [0] http://thread.gmane.org/gmane.linux.usb.general/13969
> > > > > > > 7
> > > > > > 
> > > > > > I updated to fedora 26 with kernel 4.11.0-rc7 and the issue
> > > > > > is
> > > > > > still
> > > > > > present.
> > > > > 
> > > > > 4.11-rc7 should have xhci tracepoints installed.
> > > > > Could you take some traces at boot with most recent kernel
> > > > > you
> > > > > got.
> > > > > 
> > > > > xhci traces for boot can be enabled by adding
> > > > > "trace_event=xhci-
> > > > > hcd"
> > > > > to
> > > > > your kernel cmdline
> > > > > 
> > > > > Traces should then appear in /sys/kernel/debug/tracing/trace
> > > > > 
> > > > > -Mathias
> > > > 
> > > > Thanks ! I'll try to do that, but it's utterly difficult since
> > > > each
> > > > keypress takes between 5 and 10 seconds due to this bug ...
> > > > 
> > > > So I tried it and I cannot access the trace. Are the trace
> > > > persisted
> > > > across reboot ? Since all 4.11 kernel freeze on the login page
> > > > I
> > > > have
> > > > no other choice but rebooting on a old 4.10 kernel to look at
> > > > the
> > > > trace.
> > > 
> > > You could SSH from a different machine, no? tracepoints don't
> > > persist
> > > across reboots, AFAICT.
> > > 
> > 
> > Thanks, it did work. Here is the trace:
> > 
> > 
> >       kworker/3:0-35    [003] d...     1.234029: xhci_queue_trb:
> > CMD: Enable Slot Command: flags C
> >            <idle>-0     [002] d.h.     1.234042: xhci_handle_event:
> > EVENT: TRB 000000025c47c000 status 'Success' len 0 slot 1 ep 0 type
> > 'Command Completion Event' flags e:C
> >            <idle>-0     [002] d.h.     1.234043:
> > xhci_handle_command: CMD: Enable Slot Command: flags C
> >       kworker/3:0-35    [003] ....     1.234058:
> > xhci_alloc_virt_device: vdev ffff9107a1ee4000 ctx 25c5cd000 |
> > 25c5cc000 num 0 state 0 speed 0 port 0 level 0 slot 0
> >            <idle>-0     [002] d.h.     1.239014: xhci_handle_event:
> > EVENT: TRB 000000000e000000 status 'Success' len 0 slot 0 ep 0 type
> > 'Port Status Change Event' flags e:C
> >       kworker/3:0-35    [003] ....     1.337988:
> > xhci_setup_addressable_virt_device: vdev ffff9107a1ee4000 ctx
> > 25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3 level 1 slot 1
> >       kworker/3:0-35    [003] ....     1.337990: xhci_address_ctx:
> > ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000
> >       kworker/3:0-35    [003] d...     1.337990: xhci_setup_device:
> > vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed
> > 5 port 3 level 1 slot 1
> >       kworker/3:0-35    [003] d...     1.337992: xhci_queue_trb:
> > CMD: Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
> >            <idle>-0     [001] d.h.     6.633956: xhci_handle_event:
> > EVENT: TRB 000000025c47c010 status 'Command Aborted' len 0 slot 1
> > ep 0 type 'Command Completion Event' flags e:C
> >            <idle>-0     [001] d.h.     6.633959:
> > xhci_handle_command: CMD: Address Device Command: ctx
> > 000000025c5cd000 slot 1 flags b:C
> >            <idle>-0     [001] d.h.     6.633964: xhci_handle_event:
> > EVENT: TRB 000000025c47c020 status 'Command Ring Stopped' len 0
> > slot 1 ep 0 type 'Command Completion Event' flags e:C
> >            <idle>-0     [001] d.h.     6.633964:
> > xhci_handle_command: CMD: type 'UNKNOWN' -> raw 00000000 00000000
> > 00000000 00000000
> >       kworker/3:2-163   [003] d...     6.634066: xhci_queue_trb:
> > CMD: Enable Slot Command: flags C
> >       kworker/3:0-35    [003] ....     8.682073: xhci_address_ctx:
> > ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000
> >       kworker/3:0-35    [003] d...     8.682075: xhci_setup_device:
> > vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed
> > 5 port 3 level 1 slot 1
> >       kworker/3:0-35    [003] d...     8.682079: xhci_queue_trb:
> > CMD: Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
> >            <idle>-0     [001] d.h.    13.801929: xhci_handle_event:
> > EVENT: TRB 000000025c47c020 status 'Success' len 0 slot 2 ep 0 type
> > 'Command Completion Event' flags e:C
> >            <idle>-0     [001] d.h.    13.801932:
> > xhci_handle_command: CMD: Enable Slot Command: flags C
> >            <idle>-0     [001] dNh.    13.802010: xhci_handle_event:
> > EVENT: TRB 000000025c47c030 status 'Success' len 0 slot 0 ep 0 type
> > 'Command Completion Event' flags e:C
> >            <idle>-0     [001] dNh.    13.802010:
> > xhci_handle_command: CMD: Enable Slot Command: flags C
> > 
> 
> Ok, I think I got something that could help, any chance you could
> test the following change:
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-
> ring.c
> index d9936c7..5676b43 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1342,7 +1342,7 @@ static void handle_cmd_completion(struct
> xhci_hcd *xhci,
>          cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));
>   
>          /* If CMD ring stopped we own the trbs between enqueue and
> dequeue */
> -       if (cmd_comp_code == COMP_STOPPED) {
> +       if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
>                  complete_all(&xhci->cmd_ring_stop_completion);
>                  return;
>          }
> 
> 
> 
> Details: (mostly for myself to parse the trace)
> 
> The connected usb device fails to respond to the Address device (set
> address) command.
> xhci driver times out after 5 seconds, and abort the Address device
> command.
> we get a 'Command aborted' command completion event followed by a
> 'Command ring stopped' command completion event.
> The Abort command completion event increases the command dequeue
> pointer past the
> 'Address device' command to the next command,
> which is now yet queued, hence 'UNKNOWN' -> raw 0 type of command for
> the stopped command event.
> 
> Dmesg shows "Command completion event does not match command" even if
> the ring pointers match,
> except for the 'Command Ring Stopped' event which we get the
> "UNKNOWN" command for. but for that
> event code should complete pending work and bail out before the
> message is written.. except it doesn't,
> because it checks for the wrong type of stop event, it checks for
> transfer ring stop instead of
> command ring stop.
> 
> -Mathias
> 
>    
>   
> 

Ok, so I managed to compile the xhci-hcd kernel module with the patch,
then I gzipped it. But the problem is my kernel has the xhci-hcd as
builtin driver and I don't know how to override it with my module.
Adding it to the kernel module directory seems to have no effect.

Any tips ?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux