Re: XHCI is slow during boot (bios/efi) and leaves many dmesg messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 10, 2017 at 3:04 PM, Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
> On 09.05.2017 16:50, Maël Lavault wrote:
>>
>> On Tue, 2017-05-09 at 11:34 +0300, Felipe Balbi wrote:
>>>
>>> Hi,
>>>
>>> Maël Lavault <mael.lavault@xxxxxxxxxxxxxxxxxxx> writes:
>>>>
>>>> Le 28 avr. 2017 15:57, "Maël Lavault" <mael.lavault@universalavenue
>>>> .com
>>>>>
>>>>> a écrit :
>>>>
>>>>
>>>> On Fri, 2017-04-28 at 16:58 +0300, Mathias Nyman wrote:
>>>>>
>>>>> On 21.04.2017 11:08, Maël Lavault wrote:
>>>>>>
>>>>>> On Tue, 2017-04-18 at 16:58 +0200, Maël Lavault wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I can't find how to reply to an old thread with majordomo,
>>>>>>> sorry
>>>>>>> for
>>>>>>> the inconvenience it might cause.
>>>>>>>
>>>>>>> I'm reposting an issue [0] that has been inactive for a few
>>>>>>> month
>>>>>>> but
>>>>>>> still present in kernel 4.10.10 on a Macbook pro 12,1.
>>>>>>>
>>>>>>> I can provide more informations if needed but the issue is
>>>>>>> explained
>>>>>>> in
>>>>>>> details in the bugzilla issue.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> ---------------
>>>>>>> Hi list,
>>>>>>>
>>>>>>> I have a Apple Inc. MacBookPro11,1 (with the most recent
>>>>>>> 'bios':
>>>>>>> BIOS
>>>>>>> MBP111.88Z.0138.B16.1509081438 09/08/2015).
>>>>>>>
>>>>>>> At the beginning, USB worked normally. After a while (and
>>>>>>> after
>>>>>>> newer
>>>>>>> kernel versions released by debian?) things started to act
>>>>>>> strangely.
>>>>>>> For
>>>>>>> one, the bios/efi boot takes a very long time (probably due
>>>>>>> to
>>>>>>> the
>>>>>>> same
>>>>>>> reason I describe later) just to get to the bootloader/grub.
>>>>>>> Likley
>>>>>>> resetting and probing for USB ports/mass storage. When grub
>>>>>>> finally
>>>>>>> pops
>>>>>>> up, I can use the (internal USB based keyboard) normally to
>>>>>>> select a
>>>>>>> grub
>>>>>>> entry etc.
>>>>>>>
>>>>>>> Booting the kernel then works reasonably fine, until it loads
>>>>>>> the
>>>>>>> xhci
>>>>>>> module.
>>>>>>> It spews some messages in dmesg (taking some 15 seconds) and
>>>>>>> only
>>>>>>> then,
>>>>>>> the
>>>>>>> keyboard starts to work again.
>>>>>>>
>>>>>>> The log is filled with messages like:
>>>>>>> [    7.248479] xhci_hcd 0000:00:14.0: Command completion
>>>>>>> event
>>>>>>> does
>>>>>>> not
>>>>>>> match command
>>>>>>> [    7.248495] xhci_hcd 0000:00:14.0: Timeout while waiting
>>>>>>> for
>>>>>>> setup
>>>>>>> device command
>>>>>>> [   12.256347] xhci_hcd 0000:00:14.0: Timeout while waiting
>>>>>>> for
>>>>>>> setup
>>>>>>> device command
>>>>>>> [   12.256363] usb 1-2: hub failed to enable device, error
>>>>>>> -62
>>>>>>> [   17.264166] xhci_hcd 0000:00:14.0: Timeout while waiting
>>>>>>> for
>>>>>>> setup
>>>>>>> device command
>>>>>>> (followed by USB hub/device enumeration)
>>>>>>>
>>>>>>> I've tried several combinations and quirks, updating to the
>>>>>>> latest rc
>>>>>>> kernels since 3.16 (am on 4.5.0 right now) and it only seems
>>>>>>> to
>>>>>>> get
>>>>>>> worse.
>>>>>>>
>>>>>>> Last year, on the 3.x series of kernels occasionally after a
>>>>>>> reboot
>>>>>>> the
>>>>>>> 'bios' would go through quickly and fine and also no problems
>>>>>>> loading
>>>>>>> the
>>>>>>> module and logging in. But now it always fails.
>>>>>>>
>>>>>>> Additionally it (may or may not) seems to cause the internal
>>>>>>> usb
>>>>>>> card
>>>>>>> reader to not even show up almost all of the time, though
>>>>>>> under
>>>>>>> OSX
>>>>>>> it
>>>>>>> works fine. There is/was a known issue with this cardreader
>>>>>>> where
>>>>>>> it
>>>>>>> would
>>>>>>> disappear after a suspend.
>>>>>>>
>>>>>>> Adding various seemingly related intel usb3 quirks I had no
>>>>>>> change,
>>>>>>> as
>>>>>>> I
>>>>>>> think all of them are already applied to this chipset.
>>>>>>>
>>>>>>> I'm guessing that somehow the usb chipset has some
>>>>>>> configuration
>>>>>>> option
>>>>>>> miss-set (which persists over reboots/power down) and the
>>>>>>> driver
>>>>>>> doesn't
>>>>>>> quite understand it.
>>>>>>>
>>>>>>> Unfortunately it seems that this chipset does not work in
>>>>>>> pure
>>>>>>> USB2.0
>>>>>>> (ehci) mode and needs the xhci module to work at all, so even
>>>>>>> falling
>>>>>>> to
>>>>>>> USB2 is no option. Also disconnecting all USB perhipials is
>>>>>>> nearly
>>>>>>> impossible as the touchpad, bluetooth cardreader and keyboard
>>>>>>> are
>>>>>>> internally all wired to USB.
>>>>>>>
>>>>>>> I'm attaching 3 dmesg logs with various kernels and levels of
>>>>>>> debugging
>>>>>>> information. I tried to google for errors from these logs,
>>>>>>> but to
>>>>>>> no
>>>>>>> avail.
>>>>>>>
>>>>>>> I have attached some log files on the bugzilla issue tracker
>>>>>>> [1]
>>>>>>> (they
>>>>>>> are
>>>>>>> to big for the ML I think).
>>>>>>>
>>>>>>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=115741
>>>>>>>
>>>>>>>
>>>>>>> Olliver
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [0] http://thread.gmane.org/gmane.linux.usb.general/139697
>>>>>>
>>>>>>
>>>>>> I updated to fedora 26 with kernel 4.11.0-rc7 and the issue is
>>>>>> still
>>>>>> present.
>>>>>
>>>>>
>>>>> 4.11-rc7 should have xhci tracepoints installed.
>>>>> Could you take some traces at boot with most recent kernel you
>>>>> got.
>>>>>
>>>>> xhci traces for boot can be enabled by adding "trace_event=xhci-
>>>>> hcd"
>>>>> to
>>>>> your kernel cmdline
>>>>>
>>>>> Traces should then appear in /sys/kernel/debug/tracing/trace
>>>>>
>>>>> -Mathias
>>>>
>>>>
>>>> Thanks ! I'll try to do that, but it's utterly difficult since each
>>>> keypress takes between 5 and 10 seconds due to this bug ...
>>>>
>>>> So I tried it and I cannot access the trace. Are the trace
>>>> persisted
>>>> across reboot ? Since all 4.11 kernel freeze on the login page I
>>>> have
>>>> no other choice but rebooting on a old 4.10 kernel to look at the
>>>> trace.
>>>
>>>
>>> You could SSH from a different machine, no? tracepoints don't persist
>>> across reboots, AFAICT.
>>>
>>
>> Thanks, it did work. Here is the trace:
>>
>>
>>       kworker/3:0-35    [003] d...     1.234029: xhci_queue_trb: CMD:
>> Enable Slot Command: flags C
>>            <idle>-0     [002] d.h.     1.234042: xhci_handle_event: EVENT:
>> TRB 000000025c47c000 status 'Success' len 0 slot 1 ep 0 type 'Command
>> Completion Event' flags e:C
>>            <idle>-0     [002] d.h.     1.234043: xhci_handle_command: CMD:
>> Enable Slot Command: flags C
>>       kworker/3:0-35    [003] ....     1.234058: xhci_alloc_virt_device:
>> vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 0 speed 0 port 0
>> level 0 slot 0
>>            <idle>-0     [002] d.h.     1.239014: xhci_handle_event: EVENT:
>> TRB 000000000e000000 status 'Success' len 0 slot 0 ep 0 type 'Port Status
>> Change Event' flags e:C
>>       kworker/3:0-35    [003] ....     1.337988:
>> xhci_setup_addressable_virt_device: vdev ffff9107a1ee4000 ctx 25c5cd000 |
>> 25c5cc000 num 0 state 5 speed 5 port 3 level 1 slot 1
>>       kworker/3:0-35    [003] ....     1.337990: xhci_address_ctx:
>> ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000
>>       kworker/3:0-35    [003] d...     1.337990: xhci_setup_device: vdev
>> ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3
>> level 1 slot 1
>>       kworker/3:0-35    [003] d...     1.337992: xhci_queue_trb: CMD:
>> Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
>>            <idle>-0     [001] d.h.     6.633956: xhci_handle_event: EVENT:
>> TRB 000000025c47c010 status 'Command Aborted' len 0 slot 1 ep 0 type
>> 'Command Completion Event' flags e:C
>>            <idle>-0     [001] d.h.     6.633959: xhci_handle_command: CMD:
>> Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
>>            <idle>-0     [001] d.h.     6.633964: xhci_handle_event: EVENT:
>> TRB 000000025c47c020 status 'Command Ring Stopped' len 0 slot 1 ep 0 type
>> 'Command Completion Event' flags e:C
>>            <idle>-0     [001] d.h.     6.633964: xhci_handle_command: CMD:
>> type 'UNKNOWN' -> raw 00000000 00000000 00000000 00000000
>>       kworker/3:2-163   [003] d...     6.634066: xhci_queue_trb: CMD:
>> Enable Slot Command: flags C
>>       kworker/3:0-35    [003] ....     8.682073: xhci_address_ctx:
>> ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000
>>       kworker/3:0-35    [003] d...     8.682075: xhci_setup_device: vdev
>> ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3
>> level 1 slot 1
>>       kworker/3:0-35    [003] d...     8.682079: xhci_queue_trb: CMD:
>> Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C
>>            <idle>-0     [001] d.h.    13.801929: xhci_handle_event: EVENT:
>> TRB 000000025c47c020 status 'Success' len 0 slot 2 ep 0 type 'Command
>> Completion Event' flags e:C
>>            <idle>-0     [001] d.h.    13.801932: xhci_handle_command: CMD:
>> Enable Slot Command: flags C
>>            <idle>-0     [001] dNh.    13.802010: xhci_handle_event: EVENT:
>> TRB 000000025c47c030 status 'Success' len 0 slot 0 ep 0 type 'Command
>> Completion Event' flags e:C
>>            <idle>-0     [001] dNh.    13.802010: xhci_handle_command: CMD:
>> Enable Slot Command: flags C
>>
>
> Ok, I think I got something that could help, any chance you could test the
> following change:
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index d9936c7..5676b43 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1342,7 +1342,7 @@ static void handle_cmd_completion(struct xhci_hcd
> *xhci,
>         cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));
>          /* If CMD ring stopped we own the trbs between enqueue and dequeue
> */
> -       if (cmd_comp_code == COMP_STOPPED) {
> +       if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
>                 complete_all(&xhci->cmd_ring_stop_completion);
>                 return;
>         }
>
>
>
> Details: (mostly for myself to parse the trace)
>
> The connected usb device fails to respond to the Address device (set
> address) command.
> xhci driver times out after 5 seconds, and abort the Address device command.
> we get a 'Command aborted' command completion event followed by a
> 'Command ring stopped' command completion event.
> The Abort command completion event increases the command dequeue pointer
> past the
> 'Address device' command to the next command,
> which is now yet queued, hence 'UNKNOWN' -> raw 0 type of command for the
> stopped command event.
>
> Dmesg shows "Command completion event does not match command" even if the
> ring pointers match,
> except for the 'Command Ring Stopped' event which we get the "UNKNOWN"
> command for. but for that
> event code should complete pending work and bail out before the message is
> written.. except it doesn't,
> because it checks for the wrong type of stop event, it checks for transfer
> ring stop instead of
> command ring stop.
>
> -Mathias
>
>

Hi,

I'd love to test that but I must say I'm not sure where to start (I'm
a web developer, I rarely dig that deep in kernel internals).

>From my understanding I'd have to compile the xhci_hcd kernel module,
apply the patch and load it.
Where can I get the source ? And also can I clone only the module and
not the whole kernel ?

Can you point me to some documentation ?

Thanks a lot for your help and your quick fix attempt !
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux