On Wed, May 10, 2017 at 3:04 PM, Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> wrote: > On 09.05.2017 16:50, Maël Lavault wrote: >> >> On Tue, 2017-05-09 at 11:34 +0300, Felipe Balbi wrote: >>> >>> Hi, >>> >>> Maël Lavault <mael.lavault@xxxxxxxxxxxxxxxxxxx> writes: >>>> >>>> Le 28 avr. 2017 15:57, "Maël Lavault" <mael.lavault@universalavenue >>>> .com >>>>> >>>>> a écrit : >>>> >>>> >>>> On Fri, 2017-04-28 at 16:58 +0300, Mathias Nyman wrote: >>>>> >>>>> On 21.04.2017 11:08, Maël Lavault wrote: >>>>>> >>>>>> On Tue, 2017-04-18 at 16:58 +0200, Maël Lavault wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I can't find how to reply to an old thread with majordomo, >>>>>>> sorry >>>>>>> for >>>>>>> the inconvenience it might cause. >>>>>>> >>>>>>> I'm reposting an issue [0] that has been inactive for a few >>>>>>> month >>>>>>> but >>>>>>> still present in kernel 4.10.10 on a Macbook pro 12,1. >>>>>>> >>>>>>> I can provide more informations if needed but the issue is >>>>>>> explained >>>>>>> in >>>>>>> details in the bugzilla issue. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> --------------- >>>>>>> Hi list, >>>>>>> >>>>>>> I have a Apple Inc. MacBookPro11,1 (with the most recent >>>>>>> 'bios': >>>>>>> BIOS >>>>>>> MBP111.88Z.0138.B16.1509081438 09/08/2015). >>>>>>> >>>>>>> At the beginning, USB worked normally. After a while (and >>>>>>> after >>>>>>> newer >>>>>>> kernel versions released by debian?) things started to act >>>>>>> strangely. >>>>>>> For >>>>>>> one, the bios/efi boot takes a very long time (probably due >>>>>>> to >>>>>>> the >>>>>>> same >>>>>>> reason I describe later) just to get to the bootloader/grub. >>>>>>> Likley >>>>>>> resetting and probing for USB ports/mass storage. When grub >>>>>>> finally >>>>>>> pops >>>>>>> up, I can use the (internal USB based keyboard) normally to >>>>>>> select a >>>>>>> grub >>>>>>> entry etc. >>>>>>> >>>>>>> Booting the kernel then works reasonably fine, until it loads >>>>>>> the >>>>>>> xhci >>>>>>> module. >>>>>>> It spews some messages in dmesg (taking some 15 seconds) and >>>>>>> only >>>>>>> then, >>>>>>> the >>>>>>> keyboard starts to work again. >>>>>>> >>>>>>> The log is filled with messages like: >>>>>>> [ 7.248479] xhci_hcd 0000:00:14.0: Command completion >>>>>>> event >>>>>>> does >>>>>>> not >>>>>>> match command >>>>>>> [ 7.248495] xhci_hcd 0000:00:14.0: Timeout while waiting >>>>>>> for >>>>>>> setup >>>>>>> device command >>>>>>> [ 12.256347] xhci_hcd 0000:00:14.0: Timeout while waiting >>>>>>> for >>>>>>> setup >>>>>>> device command >>>>>>> [ 12.256363] usb 1-2: hub failed to enable device, error >>>>>>> -62 >>>>>>> [ 17.264166] xhci_hcd 0000:00:14.0: Timeout while waiting >>>>>>> for >>>>>>> setup >>>>>>> device command >>>>>>> (followed by USB hub/device enumeration) >>>>>>> >>>>>>> I've tried several combinations and quirks, updating to the >>>>>>> latest rc >>>>>>> kernels since 3.16 (am on 4.5.0 right now) and it only seems >>>>>>> to >>>>>>> get >>>>>>> worse. >>>>>>> >>>>>>> Last year, on the 3.x series of kernels occasionally after a >>>>>>> reboot >>>>>>> the >>>>>>> 'bios' would go through quickly and fine and also no problems >>>>>>> loading >>>>>>> the >>>>>>> module and logging in. But now it always fails. >>>>>>> >>>>>>> Additionally it (may or may not) seems to cause the internal >>>>>>> usb >>>>>>> card >>>>>>> reader to not even show up almost all of the time, though >>>>>>> under >>>>>>> OSX >>>>>>> it >>>>>>> works fine. There is/was a known issue with this cardreader >>>>>>> where >>>>>>> it >>>>>>> would >>>>>>> disappear after a suspend. >>>>>>> >>>>>>> Adding various seemingly related intel usb3 quirks I had no >>>>>>> change, >>>>>>> as >>>>>>> I >>>>>>> think all of them are already applied to this chipset. >>>>>>> >>>>>>> I'm guessing that somehow the usb chipset has some >>>>>>> configuration >>>>>>> option >>>>>>> miss-set (which persists over reboots/power down) and the >>>>>>> driver >>>>>>> doesn't >>>>>>> quite understand it. >>>>>>> >>>>>>> Unfortunately it seems that this chipset does not work in >>>>>>> pure >>>>>>> USB2.0 >>>>>>> (ehci) mode and needs the xhci module to work at all, so even >>>>>>> falling >>>>>>> to >>>>>>> USB2 is no option. Also disconnecting all USB perhipials is >>>>>>> nearly >>>>>>> impossible as the touchpad, bluetooth cardreader and keyboard >>>>>>> are >>>>>>> internally all wired to USB. >>>>>>> >>>>>>> I'm attaching 3 dmesg logs with various kernels and levels of >>>>>>> debugging >>>>>>> information. I tried to google for errors from these logs, >>>>>>> but to >>>>>>> no >>>>>>> avail. >>>>>>> >>>>>>> I have attached some log files on the bugzilla issue tracker >>>>>>> [1] >>>>>>> (they >>>>>>> are >>>>>>> to big for the ML I think). >>>>>>> >>>>>>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=115741 >>>>>>> >>>>>>> >>>>>>> Olliver >>>>>>> >>>>>>> >>>>>>> >>>>>>> [0] http://thread.gmane.org/gmane.linux.usb.general/139697 >>>>>> >>>>>> >>>>>> I updated to fedora 26 with kernel 4.11.0-rc7 and the issue is >>>>>> still >>>>>> present. >>>>> >>>>> >>>>> 4.11-rc7 should have xhci tracepoints installed. >>>>> Could you take some traces at boot with most recent kernel you >>>>> got. >>>>> >>>>> xhci traces for boot can be enabled by adding "trace_event=xhci- >>>>> hcd" >>>>> to >>>>> your kernel cmdline >>>>> >>>>> Traces should then appear in /sys/kernel/debug/tracing/trace >>>>> >>>>> -Mathias >>>> >>>> >>>> Thanks ! I'll try to do that, but it's utterly difficult since each >>>> keypress takes between 5 and 10 seconds due to this bug ... >>>> >>>> So I tried it and I cannot access the trace. Are the trace >>>> persisted >>>> across reboot ? Since all 4.11 kernel freeze on the login page I >>>> have >>>> no other choice but rebooting on a old 4.10 kernel to look at the >>>> trace. >>> >>> >>> You could SSH from a different machine, no? tracepoints don't persist >>> across reboots, AFAICT. >>> >> >> Thanks, it did work. Here is the trace: >> >> >> kworker/3:0-35 [003] d... 1.234029: xhci_queue_trb: CMD: >> Enable Slot Command: flags C >> <idle>-0 [002] d.h. 1.234042: xhci_handle_event: EVENT: >> TRB 000000025c47c000 status 'Success' len 0 slot 1 ep 0 type 'Command >> Completion Event' flags e:C >> <idle>-0 [002] d.h. 1.234043: xhci_handle_command: CMD: >> Enable Slot Command: flags C >> kworker/3:0-35 [003] .... 1.234058: xhci_alloc_virt_device: >> vdev ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 0 speed 0 port 0 >> level 0 slot 0 >> <idle>-0 [002] d.h. 1.239014: xhci_handle_event: EVENT: >> TRB 000000000e000000 status 'Success' len 0 slot 0 ep 0 type 'Port Status >> Change Event' flags e:C >> kworker/3:0-35 [003] .... 1.337988: >> xhci_setup_addressable_virt_device: vdev ffff9107a1ee4000 ctx 25c5cd000 | >> 25c5cc000 num 0 state 5 speed 5 port 3 level 1 slot 1 >> kworker/3:0-35 [003] .... 1.337990: xhci_address_ctx: >> ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000 >> kworker/3:0-35 [003] d... 1.337990: xhci_setup_device: vdev >> ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3 >> level 1 slot 1 >> kworker/3:0-35 [003] d... 1.337992: xhci_queue_trb: CMD: >> Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C >> <idle>-0 [001] d.h. 6.633956: xhci_handle_event: EVENT: >> TRB 000000025c47c010 status 'Command Aborted' len 0 slot 1 ep 0 type >> 'Command Completion Event' flags e:C >> <idle>-0 [001] d.h. 6.633959: xhci_handle_command: CMD: >> Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C >> <idle>-0 [001] d.h. 6.633964: xhci_handle_event: EVENT: >> TRB 000000025c47c020 status 'Command Ring Stopped' len 0 slot 1 ep 0 type >> 'Command Completion Event' flags e:C >> <idle>-0 [001] d.h. 6.633964: xhci_handle_command: CMD: >> type 'UNKNOWN' -> raw 00000000 00000000 00000000 00000000 >> kworker/3:2-163 [003] d... 6.634066: xhci_queue_trb: CMD: >> Enable Slot Command: flags C >> kworker/3:0-35 [003] .... 8.682073: xhci_address_ctx: >> ctx_64=0, ctx_type=2, ctx_dma=@25c5cd000, ctx_va=@ffff91079c5cd000 >> kworker/3:0-35 [003] d... 8.682075: xhci_setup_device: vdev >> ffff9107a1ee4000 ctx 25c5cd000 | 25c5cc000 num 0 state 5 speed 5 port 3 >> level 1 slot 1 >> kworker/3:0-35 [003] d... 8.682079: xhci_queue_trb: CMD: >> Address Device Command: ctx 000000025c5cd000 slot 1 flags b:C >> <idle>-0 [001] d.h. 13.801929: xhci_handle_event: EVENT: >> TRB 000000025c47c020 status 'Success' len 0 slot 2 ep 0 type 'Command >> Completion Event' flags e:C >> <idle>-0 [001] d.h. 13.801932: xhci_handle_command: CMD: >> Enable Slot Command: flags C >> <idle>-0 [001] dNh. 13.802010: xhci_handle_event: EVENT: >> TRB 000000025c47c030 status 'Success' len 0 slot 0 ep 0 type 'Command >> Completion Event' flags e:C >> <idle>-0 [001] dNh. 13.802010: xhci_handle_command: CMD: >> Enable Slot Command: flags C >> > > Ok, I think I got something that could help, any chance you could test the > following change: > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index d9936c7..5676b43 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -1342,7 +1342,7 @@ static void handle_cmd_completion(struct xhci_hcd > *xhci, > cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status)); > /* If CMD ring stopped we own the trbs between enqueue and dequeue > */ > - if (cmd_comp_code == COMP_STOPPED) { > + if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) { > complete_all(&xhci->cmd_ring_stop_completion); > return; > } > > > > Details: (mostly for myself to parse the trace) > > The connected usb device fails to respond to the Address device (set > address) command. > xhci driver times out after 5 seconds, and abort the Address device command. > we get a 'Command aborted' command completion event followed by a > 'Command ring stopped' command completion event. > The Abort command completion event increases the command dequeue pointer > past the > 'Address device' command to the next command, > which is now yet queued, hence 'UNKNOWN' -> raw 0 type of command for the > stopped command event. > > Dmesg shows "Command completion event does not match command" even if the > ring pointers match, > except for the 'Command Ring Stopped' event which we get the "UNKNOWN" > command for. but for that > event code should complete pending work and bail out before the message is > written.. except it doesn't, > because it checks for the wrong type of stop event, it checks for transfer > ring stop instead of > command ring stop. > > -Mathias > > Hi, I'd love to test that but I must say I'm not sure where to start (I'm a web developer, I rarely dig that deep in kernel internals). >From my understanding I'd have to compile the xhci_hcd kernel module, apply the patch and load it. Where can I get the source ? And also can I clone only the module and not the whole kernel ? Can you point me to some documentation ? Thanks a lot for your help and your quick fix attempt ! -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html