Re: XHCI regression on v4.10-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/27/2017 01:31 PM, Felipe Balbi wrote:
>
> Hi,
>
> Patrice CHOTARD <patrice.chotard@xxxxxx> writes:
>>>>>>>>>> I am working on ARM STi platform, since v4.10-rc1, when booting
>>>>>>>>>> B2260 or B2120 STi boards platform
>>>>>>>>>> with nothing plugged on USB3 connector, i observed the following
>>>>>>>>>> kernel logs :
>>>>>>>>>>
>>>>>>>>>> ....
>>>>>>>>>> [  801.953836] usb usb6-port1: Cannot enable. Maybe the USB
>>>>>>>>>> cable is bad?
>>>>>>>>>
>>>>>>>>> That's odd, why does it try to reset the port if there's nothing
>>>>>>>>> there?
>>>>>>>>>
>>>>>>>>>> [  801.960455] xhci-hcd xhci-hcd.0.auto: Cannot set link state.
>>>>>>>>>
>>>>>>>>> This makes sense, nothing is connected, and we try to set link to
>>>>>>>>> U3 (PORT_PE == 0)
>>>>>>>>>
>>>>>>>>>> [  801.966611] usb usb6-port1: cannot disable (err = -32)
>>>>>>>>> xhci reutns -EPIPE as we try to set link state to U3 while port
>>>>>>>>> is not enabled (PORT_PE == 0)
>>>>>>>>>
>>>>>>>>>> [  806.083772] usb usb6-port1: Cannot enable. Maybe the USB
>>>>>>>>>> cable is bad?
>>>>>>>>>
>>>>>>>>> Again, about 5 seconds later the same port is reset, while
>>>>>>>>> nothing is connected.
>>>>>>>>> Odd
>>>>>>>>>
>>>>>>>>>> [  806.090370] xhci-hcd xhci-hcd.0.auto: Cannot set link state.
>>>>>>>>>> [  806.096494] usb usb6-port1: cannot disable (err = -32)
>>>>>>>>>> [  810.208766] usb usb6-port1: Cannot enable. Maybe the USB
>>>>>>>>>> cable is bad?
>>>>>>>>>> [  810.215374] xhci-hcd xhci-hcd.0.auto: Cannot set link state.
>>>>>>>>>> [  810.221478] usb usb6-port1: cannot disable (err = -32)
>>>>>>>>>> [  814.333767] usb usb6-port1: Cannot enable. Maybe the USB
>>>>>>>>>> cable is bad?
>>>>>>>>>> [  814.340364] xhci-hcd xhci-hcd.0.auto: Cannot set link state.
>>>>>>>>>> .....
>>>>>>>>>>
>>>>>>>>>> Another interesting thing is even if i plugged a device (a mass
>>>>>>>>>> storage device in my case) on the USB3 connector,
>>>>>>>>>> the above logs continue to appear.
>>>>>>>>>>
>>>>>>>>>> This is due to commit 37be66767e3ca "usb: hub: Fix auto-remount
>>>>>>>>>> of safely removed or ejected USB-3 devices".
>>>>>>>>>>
>>>>>>>>>> I don't know if STi platforms are the only impacted by this issue.
>>>>>>>>>>
>>>>>>>>>> On v4.9 everything was ok.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Was there not a single "Cannot enable. Maybe USB cable is bad"
>>>>>>>>> message with 4.9?
>>>>>>>>
>>>>>>>> There is no "Cannot enable. Maybe USB cable is bad" message with 4.9
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Before commit 37be66767e3ca we forced the port to RxDetect state
>>>>>>>>> via Disabled state,
>>>>>>>>> and cleared port change flags when a usb3_port_disable() was called.
>>>>>>>>>
>>>>>>>>> after we just set the port to U3. Maybe those STi platforms
>>>>>>>>> depend on that cycle somehow?
>>>>>>>>
>>>>>>>> I really don't know :-(
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Could you take logs (dmesg) of both 4.9 and 4.10-rc1 with both
>>>>>>>>> usb core and xhci debugging enabled:
>>>>>>>>>
>>>>>>>>> echo -n 'module xhci_hcd =p' >
>>>>>>>>> /sys/kernel/debug/dynamic_debug/control
>>>>>>>>> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
>>>>>>>>
>>>>>>>>
>>>>>>>> You will find below the 2 requested logs, in both case , no USB
>>>>>>>> cable are plugged.
>>>>>>>>
>>>>>>>> v4.9 dmesg logs :
>>>>>>>>
>>>>>>>> .....
>>>>>>>> [   31.368022] hub 6-0:1.0: hub_resume
>>>>>>>> [   31.368064] xhci-hcd xhci-hcd.0.auto: get port status, actual
>>>>>>>> port 0 status  = 0x400340
>>>>>>>> [   31.368071] xhci-hcd xhci-hcd.0.auto: Get port status returned
>>>>>>>> 0x400340
>>>>>>>> [   31.368224] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0000
>>>>>>>
>>>>>>> Looks like port is stuck in Compliance mode, this should only
>>>>>>> happen if a connect was detected,
>>>>>>> moving the port from RxDetect to Polling, and then timeout on
>>>>>>> polling to get to compliance.
>>>>>>>
>>>>>>> 4.9 is then stuck in a hub_suspend/resume loop with a port in
>>>>>>> compliance mode.
>>>>>>>>
>>>>>>>>
>>>>>>>> v4.10-rc1 dmesg logs :
>>>>>>>>
>>>>>>>> ....
>>>>>>>> [  269.436617] usb usb6-port1: cannot disable (err = -32)
>>>>>>>> [  269.464728] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
>>>>>>>> [  269.464756] xhci-hcd xhci-hcd.0.auto: get port status, actual
>>>>>>>> port 0 status  = 0x340
>>>>>>>
>>>>>>> I 4.10-rc1  we are again stuck in Compliance mode, but this time
>>>>>>> there was a event (00002) visible, maybe because
>>>>>>> the flags were not cleared or some other reason. Now are stuck in a
>>>>>>> port reset loop with the
>>>>>>> port in compliance mode.
>>>>>>> So the real question is why is the port in compliance mode when
>>>>>>> there are no devices connected.?
>>>>>>>
>>>>>>> I'll continue looking at this more tomorrow
>>>>>>
>>>>>> So with both 4.9 and 4.10-rc the ports are in Compliance mode state,
>>>>>> even if nothing is connected
>>>>>> To me this looks like the real underlying issue. For some reason it
>>>>>> didn't try to reset the port in 4.9,
>>>>>> and never got to the loop of port reset.
>>>>>>
>>>>>> In 4.9 the hub->events_bits[0] are not set, so the hub thread won't
>>>>>> even ask for port status
>>>>>> on any ports.
>>>>>>
>>>>>> In 4.10-rc the hub->events_bits[0] is set for port 1, It will start
>>>>>> handling port events, see the
>>>>>> compliance state and try to warm reset the port, and fail as it
>>>>>> never sees port
>>>>>>
>>>>>> Any chance you could add extra debugging to xhci to show why the
>>>>>> change bit is set?
>>>>>> Something like this:
>>>>>>
>>>>>> diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
>>>>>> index 0ef1690..e425d82 100644
>>>>>> --- a/drivers/usb/host/xhci-hub.c
>>>>>> +++ b/drivers/usb/host/xhci-hub.c
>>>>>> @@ -1271,6 +1271,10 @@ int xhci_hub_status_data(struct usb_hcd *hcd,
>>>>>> char *buf)
>>>>>>                              jiffies, bus_state->resume_done[i]))) {
>>>>>>                          buf[(i + 1) / 8] |= 1 << (i + 1) % 8;
>>>>>>                          status = 1;
>>>>>> +                       xhci_err(xhci, "portsc 0x%x,
>>>>>> port_c_suspend:%d, resume_done:%d\n",
>>>>>> +                                temp,
>>>>>> +                                bus_state->port_c_suspend,
>>>>>> +                                bus_state->resume_done[i]);
>>>>>>                  }
>>>>>>                  if ((temp & PORT_RC))
>>>>>>                          reset_change = true;
>>>>>>
>>>>>>
>>>>>> and run with 4.9 and 4.10-rc
>>>>>>
>>>>>> -Mathias
>>>>>
>>>>>
>>>>> Hi Mathias
>>>>>
>>>>> Thanks for your time
>>>>>
>>>>> For information i replaced xhci_err() by xhci_dbg() to avoid flooding
>>>>> kernel log which
>>>>>   avoid to complete the kernel boot.
>>>>>
>>>>> Please find attached both kernel log files for v4.9 and v4.10-rc1.
>>>>>
>>>>> In v4.9_xhci-hub_additionnal_debug.txt, at the beginning of the trace
>>>>> portsc value is 0xa400340, then at [   18.366392] it becomes 0x400340
>>>>> (link state change) and then 0x2802a0 and back again to 0xa400340
>>>>>
>>>>> In v4.10-rc1_xhci-hub_additionnal_debug.txt, at the beginning of the
>>>>> trace
>>>>> portsc value is 0xa400340, then at [    5.557976] it becomes 0x400340
>>>>> (link state change), and then 0x2802a0 or 0x680340 for ever ....
>>>>>
>>>>> Patrice
>>>>>
>>>>
>>>> Hi Mathias
>>>>
>>>> Did new logs give you some clue about the XHCi problem ?
>>>>
>>>
>>> Only that both cases, 4.9 and 4.10, start with port in compliance state
>>> in a
>>> auto suspend/resume. At some point the hub thread will see an event for
>>> that
>>> port and try to reset it. fail (as there is no device) and disable the
>>> port.
>>>
>>>
>>> log snippet of busylooping between autosuspend and resume, similar for
>>> both 4.9 and
>>> 4.10:
>>>
>>> [   18.329121] hub 6-0:1.0: hub_suspend
>>> [   18.329175] usb usb6: bus auto-suspend, wakeup 1
>>> [   18.329193] xhci-hcd xhci-hcd.0.auto: portsc 0xa400340,
>>> port_c_suspend:0, resume_done:0
>>> [   18.329202] usb usb6: suspend raced with wakeup event
>>> [   18.329209] usb usb6: usb auto-resume
>>> [   18.332758] xhci-hcd xhci-hcd.0.auto: portsc 0xa400340,
>>> port_c_suspend:0, resume_done:0
>>> [   18.344140] hub 6-0:1.0: hub_resume
>>> [   18.344179] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x400340
>>> [   18.344191] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340
>>> [   18.344318] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0000
>>> [   18.344368] xhci-hcd xhci-hcd.0.auto: set port remote wake mask,
>>> actual port 0 status  = 0xe400340
>>> [   18.344451] hub 6-0:1.0: hub_suspend
>>>
>>>
>>> In 4.9 we go back to the beginning, busylooping between auto suspend and
>>> resume,
>>> until after some random time when hub sees an event again, and continues
>>> all over again.
>>>
>>> log snippet of event, and resetting the port for 4.9:
>>>
>>> [   18.367378] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
>>> [   18.367417] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x400340
>>> [   18.367426] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340
>>> [   18.367805] usb usb6-port1: link state change
>>> [   18.367846] xhci-hcd xhci-hcd.0.auto: clear port link state change,
>>> actual port 0 status  = 0x340
>>> [   18.367943] usb usb6-port1: do warm reset
>>> [   18.368043] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>> [   18.428313] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x2b0
>>> [   18.428330] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2b0
>>> [   18.428475] usb usb6-port1: not warm reset yet, waiting 50ms
>>> [   18.468135] xhci-hcd xhci-hcd.0.auto: Port Status Change Event for
>>> port 2
>>> [   18.468155] xhci-hcd xhci-hcd.0.auto: handle_port_status: starting
>>> port polling.
>>> [   18.468167] xhci-hcd xhci-hcd.0.auto: portsc 0x2802a0,
>>> port_c_suspend:0, resume_done:0
>>> [   18.487840] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x2802e0
>>> [   18.487855] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x3002e0
>>> [   18.488636] xhci-hcd xhci-hcd.0.auto: clear port reset change, actual
>>> port 0 status  = 0x802e0
>>> [   18.488736] xhci-hcd xhci-hcd.0.auto: clear port warm(BH) reset
>>> change, actual port 0 status  = 0x2e0
>>> [   18.488803] xhci-hcd xhci-hcd.0.auto: clear port link state change,
>>> actual port 0 status  = 0x2e0
>>> [   18.488869] xhci-hcd xhci-hcd.0.auto: clear port connect change,
>>> actual port 0 status  = 0x2e0
>>> [   18.489012] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x2e0
>>> [   18.489026] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2e0
>>> [   18.489112] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x2e0
>>> [   18.489122] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2e0
>>> [   18.489182] xhci-hcd xhci-hcd.0.auto: Disable port 0
>>> [   18.489239] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x280
>>> [   18.489247] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x280
>>> [   18.489296] xhci-hcd xhci-hcd.0.auto: Enable port 0
>>> [   18.489357] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
>>> [   18.489386] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x2a0
>>> [   18.489395] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2a0
>>> [   18.489489] xhci-hcd xhci-hcd.0.auto: set port remote wake mask,
>>> actual port 0 status  = 0xe0002a0
>>> [   18.489547] hub 6-0:1.0: hub_suspend
>>> [   18.489606] usb usb6: bus auto-suspend, wakeup 1
>>> [   18.489624] xhci-hcd xhci-hcd.0.auto: xhci_hub_status_data: stopping
>>> port polling.
>>> [   18.522772] xhci-hcd xhci-hcd.0.auto: xhci_hub_status_data: stopping
>>> port polling.
>>> [   18.543748] Link is Up - 1000/Full
>>> [   18.863014] xhci-hcd xhci-hcd.0.auto: Port Status Change Event for
>>> port 2
>>> [   18.863033] xhci-hcd xhci-hcd.0.auto: resume root hub
>>> [   18.863073] xhci-hcd xhci-hcd.0.auto: handle_port_status: starting
>>> port polling.
>>> [   18.863089] xhci-hcd xhci-hcd.0.auto: portsc 0xa400340,
>>> port_c_suspend:0, resume_done:0
>>> [   18.863333] usb usb6: usb wakeup-resume
>>> [   18.863355] usb usb6: usb auto-resume
>>> [   18.863379] hub 6-0:1.0: hub_resume
>>> [   18.863410] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x400340
>>> [   18.863419] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340
>>>
>>> In 4.10 the port disable is a bit different, so after hub sees the first
>>> event (evt 0002) and tries to
>>> reset the port, it will notice the reset fails, and try to reset port
>>> several timer, then disable device,
>>> which will end up with another  event (evt 0002) and we start the whole
>>> reset loop again.
>>>
>>> [    5.558058] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
>>> [    5.558093] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x400340
>>> [    5.558103] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340
>>> [    5.558187] usb usb6-port1: link state change
>>> [    5.558221] xhci-hcd xhci-hcd.0.auto: clear port link state change,
>>> actual port 0 status  = 0x340
>>> [    5.558278] usb usb6-port1: do warm reset
>>> [    5.558437] systemd-journald[108]: /dev/kmsg buffer overrun, some
>>> messages lost.
>>> [    5.623028] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x2b0
>>> [    5.623046] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2b0
>>> [    5.623134] usb usb6-port1: not warm reset yet, waiting 50ms
>>> [    5.658454] xhci-hcd xhci-hcd.0.auto: Port Status Change Event for
>>> port 2
>>> [    5.658471] xhci-hcd xhci-hcd.0.auto: handle_port_status: starting
>>> port polling.
>>> [    5.658487] xhci-hcd xhci-hcd.0.auto: portsc 0x2802a0,
>>> port_c_suspend:0, resume_done:0
>>> [    5.683071] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x2802e0
>>> [    5.683088] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x3002e0
>>> [    5.683187] usb usb6-port1: not warm reset yet, waiting 200ms
>>> [    5.777987] xhci-hcd xhci-hcd.0.auto: portsc 0x2802e0,
>>> port_c_suspend:0, resume_done:0
>>> [    5.894043] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x2802e0
>>> [    5.894061] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x3002e0
>>> [    5.894158] usb usb6-port1: not warm reset yet, waiting 200ms
>>> [    6.027992] xhci-hcd xhci-hcd.0.auto: portsc 0x2802e0,
>>> port_c_suspend:0, resume_done:0
>>> [    6.103092] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x680340
>>> [    6.103110] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x700340
>>> [    6.103238] usb usb6-port1: not warm reset yet, waiting 200ms
>>> [    6.277993] xhci-hcd xhci-hcd.0.auto: portsc 0x680340,
>>> port_c_suspend:0, resume_done:0
>>> [    6.313066] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x680340
>>> [    6.313085] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x700340
>>> [    6.313349] usb usb6-port1: not warm reset yet, waiting 200ms
>>> [    6.313385] xhci-hcd xhci-hcd.0.auto: clear port reset change, actual
>>> port 0 status  = 0x480340
>>> [    6.313464] xhci-hcd xhci-hcd.0.auto: clear port warm(BH) reset
>>> change, actual port 0 status  = 0x400340
>>> [    6.313523] xhci-hcd xhci-hcd.0.auto: clear port link state change,
>>> actual port 0 status  = 0x340
>>> [    6.313587] xhci-hcd xhci-hcd.0.auto: clear port connect change,
>>> actual port 0 status  = 0x340
>>> [    6.313651] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x340
>>> [    6.313665] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x340
>>> ..loop a few times, until
>>> [    9.673761] usb usb6-port1: not enabled, trying warm reset again...
>>> [    9.673771] usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
>>> [    9.680418] xhci-hcd xhci-hcd.0.auto: Cannot set link state.
>>> [    9.722276] usb usb6-port1: cannot disable (err = -32)
>>> [    9.754240] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002
>>> [    9.754291] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x340
>>> [    9.754304] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x340
>>> [    9.754419] usb usb6-port1: do warm reset
>>> [    9.782987] xhci-hcd xhci-hcd.0.auto: xhci_hub_status_data: stopping
>>> port polling.
>>> [    9.813087] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0
>>> status  = 0x2b0
>>>
>>>
>>> We could try things like clearing all port change flags in ports
>>> disable, and/or only move ports
>>> in xhci enabled state to U3 at port disable, from other states we would
>>> do like in 4.9, and
>>> move to RxDetect via disable.
>>>
>>> But this would not really solve the STi case, it would at best go back
>>> to 4.9 where the issue is
>>> better hidden and silently stay in the autosuspend/resume busyloop with
>>> port in Compliance mode.
>>>
>>> You really should focus on why the port link state is in compliance in
>>> the first place.
>>>
>>> -Mathias
>>
>> Hi Mathias
>>
>> +Felipe Balbi
>>
>> I did further test on previous kernel release.
>> On v4.7, everything was ok , first issue appeared with v4.8-rc1 kernel.
>>
>> After bisection, the issue was introduced by
>> c499ff71ff usb: dwc3: core: re-factor init and exit paths
>>
>> If dwc3_phy_setup() call is moved back in dwc3_probe() instead of
>> dwc3_core_init(), everything is ok on STMicroelectronics platforms.
>>
>> I don't know if this issue has been seen on another platform.
>>
>> I can send a fix for this, but i wanted to get Felipe opinion on this.
>
> That really shouldn't cause any issues for xHCI. We're initializing the
> PHYs a little later, but still way before xHCI has a chance of
> probing. Are you sure your bisection is sound?

Hi Felipe

I confirm the bisection was correct

> Which PHY driver are you

The phy used is drivers/phy/phy-miphy28lp.c.

> using? Maybe you have some quirk flag missing?
>

Checking the available quirk flags for snps dwc3 and
re-reading what Mathias indicated me, he noticed that
xhci try to set link in U3 and returns an error,
i saw that "snps,dis_u3_susphy_quirk" could help.

I then set "snps,dis_u3_susphy_quirk" in our board DT which fix
the issue.

Felipe, thanks for the tips
Mathias, thanks for your time ;-)

Patrice

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux