Re: XHCI is slow during boot (bios/efi) and leaves many dmesg messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2017-05-16 at 13:25 +0300, Mathias Nyman wrote:
> On 16.05.2017 13:00, Maël Lavault wrote:
> > On Wed, 2017-05-10 at 16:04 +0300, Mathias Nyman wrote:
> > > On 09.05.2017 16:50, Maël Lavault wrote:
> > > > On Tue, 2017-05-09 at 11:34 +0300, Felipe Balbi wrote:
> > > > > Hi,
> > > > > 
> > > > > Maël Lavault <mael.lavault@xxxxxxxxxxxxxxxxxxx> writes:
> > > > > > Le 28 avr. 2017 15:57, "Maël Lavault" <mael.lavault@univers
> > > > > > alav
> > > > > > enue
> > > > > > .com
> > > > > > > a écrit :
> > > > > > 
> > > > > > On Fri, 2017-04-28 at 16:58 +0300, Mathias Nyman wrote:
> > > > > > > On 21.04.2017 11:08, Maël Lavault wrote:
> > > > > > > > On Tue, 2017-04-18 at 16:58 +0200, Maël Lavault wrote:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > I can't find how to reply to an old thread with
> > > > > > > > > majordomo,
> > > > > > > > > sorry
> > > > > > > > > for
> > > > > > > > > the inconvenience it might cause.
> > > > > > > > > 
> > > > > > > > > I'm reposting an issue [0] that has been inactive for
> > > > > > > > > a
> > > > > > > > > few
> > > > > > > > > month
> > > > > > > > > but
> > > > > > > > > still present in kernel 4.10.10 on a Macbook pro
> > > > > > > > > 12,1.
> > > > > > > > > 
> > > > > > > > > I can provide more informations if needed but the
> > > > > > > > > issue
> > > > > > > > > is
> > > > > > > > > explained
> > > > > > > > > in
> > > > > > > > > details in the bugzilla issue.
> > > > > > > > > 
> > > > > > > > > Thanks.
> > > > > > > > > 
> > > > > > > > > ---------------
> > > > > > > > > Hi list,
> > > > > > > > > 
> > > > > > > > > I have a Apple Inc. MacBookPro11,1 (with the most
> > > > > > > > > recent
> > > > > > > > > 'bios':
> > > > > > > > > BIOS
> > > > > > > > > MBP111.88Z.0138.B16.1509081438 09/08/2015).
> > > > > > > > > 
> > > > > > > > > At the beginning, USB worked normally. After a while
> > > > > > > > > (and
> > > > > > > > > after
> > > > > > > > > newer
> > > > > > > > > kernel versions released by debian?) things started
> > > > > > > > > to
> > > > > > > > > act
> > > > > > > > > strangely.
> > > > > > > > > For
> > > > > > > > > one, the bios/efi boot takes a very long time
> > > > > > > > > (probably
> > > > > > > > > due
> > > > > > > > > to
> > > > > > > > > the
> > > > > > > > > same
> > > > > > > > > reason I describe later) just to get to the
> > > > > > > > > bootloader/grub.
> > > > > > > > > Likley
> > > > > > > > > resetting and probing for USB ports/mass storage.
> > > > > > > > > When
> > > > > > > > > grub
> > > > > > > > > finally
> > > > > > > > > pops
> > > > > > > > > up, I can use the (internal USB based keyboard)
> > > > > > > > > normally
> > > > > > > > > to
> > > > > > > > > select a
> > > > > > > > > grub
> > > > > > > > > entry etc.
> > > > > > > > > 
> > > > > > > > > Booting the kernel then works reasonably fine, until
> > > > > > > > > it
> > > > > > > > > loads
> > > > > > > > > the
> > > > > > > > > xhci
> > > > > > > > > module.
> > > > > > > > > It spews some messages in dmesg (taking some 15
> > > > > > > > > seconds)
> > > > > > > > > and
> > > > > > > > > only
> > > > > > > > > then,
> > > > > > > > > the
> > > > > > > > > keyboard starts to work again.
> > > > > > > > > 
> > > > > > > > > The log is filled with messages like:
> > > > > > > > > [    7.248479] xhci_hcd 0000:00:14.0: Command
> > > > > > > > > completion
> > > > > > > > > event
> > > > > > > > > does
> > > > > > > > > not
> > > > > > > > > match command
> > > > > > > > > [    7.248495] xhci_hcd 0000:00:14.0: Timeout while
> > > > > > > > > waiting
> > > > > > > > > for
> > > > > > > > > setup
> > > > > > > > > device command
> > > > > > > > > [   12.256347] xhci_hcd 0000:00:14.0: Timeout while
> > > > > > > > > waiting
> > > > > > > > > for
> > > > > > > > > setup
> > > > > > > > > device command
> > > > > > > > > [   12.256363] usb 1-2: hub failed to enable device,
> > > > > > > > > error
> > > > > > > > > -62
> > > > > > > > > [   17.264166] xhci_hcd 0000:00:14.0: Timeout while
> > > > > > > > > waiting
> > > > > > > > > for
> > > > > > > > > setup
> > > > > > > > > device command
> > > > > > > > > (followed by USB hub/device enumeration)
> > > > > > > > > 
> > > > > > > > > I've tried several combinations and quirks, updating
> > > > > > > > > to
> > > > > > > > > the
> > > > > > > > > latest rc
> > > > > > > > > kernels since 3.16 (am on 4.5.0 right now) and it
> > > > > > > > > only
> > > > > > > > > seems
> > > > > > > > > to
> > > > > > > > > get
> > > > > > > > > worse.
> > > > > > > > > 
> > > > > > > > > Last year, on the 3.x series of kernels occasionally
> > > > > > > > > after a
> > > > > > > > > reboot
> > > > > > > > > the
> > > > > > > > > 'bios' would go through quickly and fine and also no
> > > > > > > > > problems
> > > > > > > > > loading
> > > > > > > > > the
> > > > > > > > > module and logging in. But now it always fails.
> > > > > > > > > 
> > > > > > > > > Additionally it (may or may not) seems to cause the
> > > > > > > > > internal
> > > > > > > > > usb
> > > > > > > > > card
> > > > > > > > > reader to not even show up almost all of the time,
> > > > > > > > > though
> > > > > > > > > under
> > > > > > > > > OSX
> > > > > > > > > it
> > > > > > > > > works fine. There is/was a known issue with this
> > > > > > > > > cardreader
> > > > > > > > > where
> > > > > > > > > it
> > > > > > > > > would
> > > > > > > > > disappear after a suspend.
> > > > > > > > > 
> > > > > > > > > Adding various seemingly related intel usb3 quirks I
> > > > > > > > > had
> > > > > > > > > no
> > > > > > > > > change,
> > > > > > > > > as
> > > > > > > > > I
> > > > > > > > > think all of them are already applied to this
> > > > > > > > > chipset.
> > > > > > > > > 
> > > > > > > > > I'm guessing that somehow the usb chipset has some
> > > > > > > > > configuration
> > > > > > > > > option
> > > > > > > > > miss-set (which persists over reboots/power down) and
> > > > > > > > > the
> > > > > > > > > driver
> > > > > > > > > doesn't
> > > > > > > > > quite understand it.
> > > > > > > > > 
> > > > > > > > > Unfortunately it seems that this chipset does not
> > > > > > > > > work in
> > > > > > > > > pure
> > > > > > > > > USB2.0
> > > > > > > > > (ehci) mode and needs the xhci module to work at all,
> > > > > > > > > so
> > > > > > > > > even
> > > > > > > > > falling
> > > > > > > > > to
> > > > > > > > > USB2 is no option. Also disconnecting all USB
> > > > > > > > > perhipials
> > > > > > > > > is
> > > > > > > > > nearly
> > > > > > > > > impossible as the touchpad, bluetooth cardreader and
> > > > > > > > > keyboard
> > > > > > > > > are
> > > > > > > > > internally all wired to USB.
> > > > > > > > > 
> > > > > > > > > I'm attaching 3 dmesg logs with various kernels and
> > > > > > > > > levels of
> > > > > > > > > debugging
> > > > > > > > > information. I tried to google for errors from these
> > > > > > > > > logs,
> > > > > > > > > but to
> > > > > > > > > no
> > > > > > > > > avail.
> > > > > > > > > 
> > > > > > > > > I have attached some log files on the bugzilla issue
> > > > > > > > > tracker
> > > > > > > > > [1]
> > > > > > > > > (they
> > > > > > > > > are
> > > > > > > > > to big for the ML I think).
> > > > > > > > > 
> > > > > > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=11574
> > > > > > > > > 1
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Olliver
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > [0] http://thread.gmane.org/gmane.linux.usb.general/1
> > > > > > > > > 3969
> > > > > > > > > 7
> > > > > > > > 
> > > > > > > > I updated to fedora 26 with kernel 4.11.0-rc7 and the
> > > > > > > > issue
> > > > > > > > is
> > > > > > > > still
> > > > > > > > present.
> > > > > > > 
> > > > > > > 4.11-rc7 should have xhci tracepoints installed.
> > > > > > > Could you take some traces at boot with most recent
> > > > > > > kernel
> > > > > > > you
> > > > > > > got.
> > > > > > > 
> > > > > > > xhci traces for boot can be enabled by adding
> > > > > > > "trace_event=xhci-
> > > > > > > hcd"
> > > > > > > to
> > > > > > > your kernel cmdline
> > > > > > > 
> > > > > > > Traces should then appear in
> > > > > > > /sys/kernel/debug/tracing/trace
> > > > > > > 
> > > > > > > -Mathias
> > > > > > 
> > > > > > Thanks ! I'll try to do that, but it's utterly difficult
> > > > > > since
> > > > > > each
> > > > > > keypress takes between 5 and 10 seconds due to this bug ...
> > > > > > 
> > > > > > So I tried it and I cannot access the trace. Are the trace
> > > > > > persisted
> > > > > > across reboot ? Since all 4.11 kernel freeze on the login
> > > > > > page
> > > > > > I
> > > > > > have
> > > > > > no other choice but rebooting on a old 4.10 kernel to look
> > > > > > at
> > > > > > the
> > > > > > trace.
> > > > > 
> > > > > You could SSH from a different machine, no? tracepoints don't
> > > > > persist
> > > > > across reboots, AFAICT.
> > > > > 
> > > > 
> > > > Thanks, it did work. Here is the trace:
> > > > 
> > > > 
> > > >        kworker/3:0-35    [003] d...     1.234029:
> > > > xhci_queue_trb:
> > > > CMD: Enable Slot Command: flags C
> > > >             <idle>-0     [002] d.h.     1.234042:
> > > > xhci_handle_event:
> > > > EVENT: TRB 000000025c47c000 status 'Success' len 0 slot 1 ep 0
> > > > type
> > > > 'Command Completion Event' flags e:C
> > > >             <idle>-0     [002] d.h.     1.234043:
> > > > xhci_handle_command: CMD: Enable Slot Command: flags C
> > > >        kworker/3:0-35    [003] ....     1.234058:
> > > > xhci_alloc_virt_device: vdev ffff9107a1ee4000 ctx 25c5cd000 |
> > > > 25c5cc000 num 0 state 0 speed 0 port 0 level 0 slot 0
> > > >             <idle>-0     [002] d.h.     1.239014:
> > > > xhci_handle_event:
> > > > EVENT: TRB 000000000e000000 status 'Success' len 0 slot 0 ep 0
> > > > type
> > > > 'Port Status Change Event' flags e:C
> > > >        kworker/3:0-35    [003] ....     1.337988:
> > > > xhci_setup_addressable_virt_device: vdev ffff9107a1ee4000 ctx
> > > > 25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3 level 1 slot
> > > > 1
> > > >        kworker/3:0-35    [003] ....     1.337990:
> > > > xhci_address_ctx:
> > > > ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd
> > > > 000
> > > >        kworker/3:0-35    [003] d...     1.337990:
> > > > xhci_setup_device:
> > > > vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5
> > > > speed
> > > > 5 port 3 level 1 slot 1
> > > >        kworker/3:0-35    [003] d...     1.337992:
> > > > xhci_queue_trb:
> > > > CMD: Address Device Command: ctx 000000025c5cd000 slot 1 flags
> > > > b:C
> > > >             <idle>-0     [001] d.h.     6.633956:
> > > > xhci_handle_event:
> > > > EVENT: TRB 000000025c47c010 status 'Command Aborted' len 0 slot
> > > > 1
> > > > ep 0 type 'Command Completion Event' flags e:C
> > > >             <idle>-0     [001] d.h.     6.633959:
> > > > xhci_handle_command: CMD: Address Device Command: ctx
> > > > 000000025c5cd000 slot 1 flags b:C
> > > >             <idle>-0     [001] d.h.     6.633964:
> > > > xhci_handle_event:
> > > > EVENT: TRB 000000025c47c020 status 'Command Ring Stopped' len 0
> > > > slot 1 ep 0 type 'Command Completion Event' flags e:C
> > > >             <idle>-0     [001] d.h.     6.633964:
> > > > xhci_handle_command: CMD: type 'UNKNOWN' -> raw 00000000
> > > > 00000000
> > > > 00000000 00000000
> > > >        kworker/3:2-163   [003] d...     6.634066:
> > > > xhci_queue_trb:
> > > > CMD: Enable Slot Command: flags C
> > > >        kworker/3:0-35    [003] ....     8.682073:
> > > > xhci_address_ctx:
> > > > ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd
> > > > 000
> > > >        kworker/3:0-35    [003] d...     8.682075:
> > > > xhci_setup_device:
> > > > vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5
> > > > speed
> > > > 5 port 3 level 1 slot 1
> > > >        kworker/3:0-35    [003] d...     8.682079:
> > > > xhci_queue_trb:
> > > > CMD: Address Device Command: ctx 000000025c5cd000 slot 1 flags
> > > > b:C
> > > >             <idle>-0     [001] d.h.    13.801929:
> > > > xhci_handle_event:
> > > > EVENT: TRB 000000025c47c020 status 'Success' len 0 slot 2 ep 0
> > > > type
> > > > 'Command Completion Event' flags e:C
> > > >             <idle>-0     [001] d.h.    13.801932:
> > > > xhci_handle_command: CMD: Enable Slot Command: flags C
> > > >             <idle>-0     [001] dNh.    13.802010:
> > > > xhci_handle_event:
> > > > EVENT: TRB 000000025c47c030 status 'Success' len 0 slot 0 ep 0
> > > > type
> > > > 'Command Completion Event' flags e:C
> > > >             <idle>-0     [001] dNh.    13.802010:
> > > > xhci_handle_command: CMD: Enable Slot Command: flags C
> > > > 
> > > 
> > > Ok, I think I got something that could help, any chance you could
> > > test the following change:
> > > 
> > > diff --git a/drivers/usb/host/xhci-ring.c
> > > b/drivers/usb/host/xhci-
> > > ring.c
> > > index d9936c7..5676b43 100644
> > > --- a/drivers/usb/host/xhci-ring.c
> > > +++ b/drivers/usb/host/xhci-ring.c
> > > @@ -1342,7 +1342,7 @@ static void handle_cmd_completion(struct
> > > xhci_hcd *xhci,
> > >           cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event-
> > > >status));
> > > 
> > >           /* If CMD ring stopped we own the trbs between enqueue
> > > and
> > > dequeue */
> > > -       if (cmd_comp_code == COMP_STOPPED) {
> > > +       if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
> > >                   complete_all(&xhci->cmd_ring_stop_completion);
> > >                   return;
> > >           }
> > > 
> > > 
> > > 
> > > Details: (mostly for myself to parse the trace)
> > > 
> > > The connected usb device fails to respond to the Address device
> > > (set
> > > address) command.
> > > xhci driver times out after 5 seconds, and abort the Address
> > > device
> > > command.
> > > we get a 'Command aborted' command completion event followed by a
> > > 'Command ring stopped' command completion event.
> > > The Abort command completion event increases the command dequeue
> > > pointer past the
> > > 'Address device' command to the next command,
> > > which is now yet queued, hence 'UNKNOWN' -> raw 0 type of command
> > > for
> > > the stopped command event.
> > > 
> > > Dmesg shows "Command completion event does not match command"
> > > even if
> > > the ring pointers match,
> > > except for the 'Command Ring Stopped' event which we get the
> > > "UNKNOWN" command for. but for that
> > > event code should complete pending work and bail out before the
> > > message is written.. except it doesn't,
> > > because it checks for the wrong type of stop event, it checks for
> > > transfer ring stop instead of
> > > command ring stop.
> > > 
> > > -Mathias
> > > 
> > > 
> > > 
> > > 
> > 
> > Ok, so I managed to compile the xhci-hcd kernel module with the
> > patch,
> > then I gzipped it. But the problem is my kernel has the xhci-hcd as
> > builtin driver and I don't know how to override it with my module.
> > Adding it to the kernel module directory seems to have no effect.
> > 
> > Any tips ?
> > 
> > Thanks
> > 
> 
> I set up a branch for this issues
> Probably easier to pull that whole branch and compile and install it.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git xhci-
> macbook
> 
> Alex Henrie pointed out in another thread there are several people
> suffering from this.
> Alex tried that branch, which contains the fix, but it didn't help.
> 
> There are a few more things that could be tried that are not
> implemented. The spec
> have have additional notes about disabling and re-enabling the xhci
> slot before retrying
> to address a device that timed out on address device command.
> 
> I'll let you both know when something new is ready and can be tested
> 
> Thanks
> -Mathias
> 
> 
> Xhci spec have additional notes about ut.
> 
> I will implement

Thanks ! 

Anything I can do to help ? It is a very annoying issue for me, to
boot, I have to select an old 4.10 kernel (which takes ages since usb
is extremely slow on first boot), then reboot into a 4.11 kernel (usb
works again as it should after a reboot) with a special drm fix
(otherwise my computer freeze).


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux