Hi, Patrice CHOTARD <patrice.chotard@xxxxxx> writes: >>>>>>>>>>> I am working on ARM STi platform, since v4.10-rc1, when booting >>>>>>>>>>> B2260 or B2120 STi boards platform >>>>>>>>>>> with nothing plugged on USB3 connector, i observed the following >>>>>>>>>>> kernel logs : >>>>>>>>>>> >>>>>>>>>>> .... >>>>>>>>>>> [ 801.953836] usb usb6-port1: Cannot enable. Maybe the USB >>>>>>>>>>> cable is bad? >>>>>>>>>> >>>>>>>>>> That's odd, why does it try to reset the port if there's nothing >>>>>>>>>> there? >>>>>>>>>> >>>>>>>>>>> [ 801.960455] xhci-hcd xhci-hcd.0.auto: Cannot set link state. >>>>>>>>>> >>>>>>>>>> This makes sense, nothing is connected, and we try to set link to >>>>>>>>>> U3 (PORT_PE == 0) >>>>>>>>>> >>>>>>>>>>> [ 801.966611] usb usb6-port1: cannot disable (err = -32) >>>>>>>>>> xhci reutns -EPIPE as we try to set link state to U3 while port >>>>>>>>>> is not enabled (PORT_PE == 0) >>>>>>>>>> >>>>>>>>>>> [ 806.083772] usb usb6-port1: Cannot enable. Maybe the USB >>>>>>>>>>> cable is bad? >>>>>>>>>> >>>>>>>>>> Again, about 5 seconds later the same port is reset, while >>>>>>>>>> nothing is connected. >>>>>>>>>> Odd >>>>>>>>>> >>>>>>>>>>> [ 806.090370] xhci-hcd xhci-hcd.0.auto: Cannot set link state. >>>>>>>>>>> [ 806.096494] usb usb6-port1: cannot disable (err = -32) >>>>>>>>>>> [ 810.208766] usb usb6-port1: Cannot enable. Maybe the USB >>>>>>>>>>> cable is bad? >>>>>>>>>>> [ 810.215374] xhci-hcd xhci-hcd.0.auto: Cannot set link state. >>>>>>>>>>> [ 810.221478] usb usb6-port1: cannot disable (err = -32) >>>>>>>>>>> [ 814.333767] usb usb6-port1: Cannot enable. Maybe the USB >>>>>>>>>>> cable is bad? >>>>>>>>>>> [ 814.340364] xhci-hcd xhci-hcd.0.auto: Cannot set link state. >>>>>>>>>>> ..... >>>>>>>>>>> >>>>>>>>>>> Another interesting thing is even if i plugged a device (a mass >>>>>>>>>>> storage device in my case) on the USB3 connector, >>>>>>>>>>> the above logs continue to appear. >>>>>>>>>>> >>>>>>>>>>> This is due to commit 37be66767e3ca "usb: hub: Fix auto-remount >>>>>>>>>>> of safely removed or ejected USB-3 devices". >>>>>>>>>>> >>>>>>>>>>> I don't know if STi platforms are the only impacted by this issue. >>>>>>>>>>> >>>>>>>>>>> On v4.9 everything was ok. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Was there not a single "Cannot enable. Maybe USB cable is bad" >>>>>>>>>> message with 4.9? >>>>>>>>> >>>>>>>>> There is no "Cannot enable. Maybe USB cable is bad" message with 4.9 >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Before commit 37be66767e3ca we forced the port to RxDetect state >>>>>>>>>> via Disabled state, >>>>>>>>>> and cleared port change flags when a usb3_port_disable() was called. >>>>>>>>>> >>>>>>>>>> after we just set the port to U3. Maybe those STi platforms >>>>>>>>>> depend on that cycle somehow? >>>>>>>>> >>>>>>>>> I really don't know :-( >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Could you take logs (dmesg) of both 4.9 and 4.10-rc1 with both >>>>>>>>>> usb core and xhci debugging enabled: >>>>>>>>>> >>>>>>>>>> echo -n 'module xhci_hcd =p' > >>>>>>>>>> /sys/kernel/debug/dynamic_debug/control >>>>>>>>>> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control >>>>>>>>> >>>>>>>>> >>>>>>>>> You will find below the 2 requested logs, in both case , no USB >>>>>>>>> cable are plugged. >>>>>>>>> >>>>>>>>> v4.9 dmesg logs : >>>>>>>>> >>>>>>>>> ..... >>>>>>>>> [ 31.368022] hub 6-0:1.0: hub_resume >>>>>>>>> [ 31.368064] xhci-hcd xhci-hcd.0.auto: get port status, actual >>>>>>>>> port 0 status = 0x400340 >>>>>>>>> [ 31.368071] xhci-hcd xhci-hcd.0.auto: Get port status returned >>>>>>>>> 0x400340 >>>>>>>>> [ 31.368224] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0000 >>>>>>>> >>>>>>>> Looks like port is stuck in Compliance mode, this should only >>>>>>>> happen if a connect was detected, >>>>>>>> moving the port from RxDetect to Polling, and then timeout on >>>>>>>> polling to get to compliance. >>>>>>>> >>>>>>>> 4.9 is then stuck in a hub_suspend/resume loop with a port in >>>>>>>> compliance mode. >>>>>>>>> >>>>>>>>> >>>>>>>>> v4.10-rc1 dmesg logs : >>>>>>>>> >>>>>>>>> .... >>>>>>>>> [ 269.436617] usb usb6-port1: cannot disable (err = -32) >>>>>>>>> [ 269.464728] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002 >>>>>>>>> [ 269.464756] xhci-hcd xhci-hcd.0.auto: get port status, actual >>>>>>>>> port 0 status = 0x340 >>>>>>>> >>>>>>>> I 4.10-rc1 we are again stuck in Compliance mode, but this time >>>>>>>> there was a event (00002) visible, maybe because >>>>>>>> the flags were not cleared or some other reason. Now are stuck in a >>>>>>>> port reset loop with the >>>>>>>> port in compliance mode. >>>>>>>> So the real question is why is the port in compliance mode when >>>>>>>> there are no devices connected.? >>>>>>>> >>>>>>>> I'll continue looking at this more tomorrow >>>>>>> >>>>>>> So with both 4.9 and 4.10-rc the ports are in Compliance mode state, >>>>>>> even if nothing is connected >>>>>>> To me this looks like the real underlying issue. For some reason it >>>>>>> didn't try to reset the port in 4.9, >>>>>>> and never got to the loop of port reset. >>>>>>> >>>>>>> In 4.9 the hub->events_bits[0] are not set, so the hub thread won't >>>>>>> even ask for port status >>>>>>> on any ports. >>>>>>> >>>>>>> In 4.10-rc the hub->events_bits[0] is set for port 1, It will start >>>>>>> handling port events, see the >>>>>>> compliance state and try to warm reset the port, and fail as it >>>>>>> never sees port >>>>>>> >>>>>>> Any chance you could add extra debugging to xhci to show why the >>>>>>> change bit is set? >>>>>>> Something like this: >>>>>>> >>>>>>> diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c >>>>>>> index 0ef1690..e425d82 100644 >>>>>>> --- a/drivers/usb/host/xhci-hub.c >>>>>>> +++ b/drivers/usb/host/xhci-hub.c >>>>>>> @@ -1271,6 +1271,10 @@ int xhci_hub_status_data(struct usb_hcd *hcd, >>>>>>> char *buf) >>>>>>> jiffies, bus_state->resume_done[i]))) { >>>>>>> buf[(i + 1) / 8] |= 1 << (i + 1) % 8; >>>>>>> status = 1; >>>>>>> + xhci_err(xhci, "portsc 0x%x, >>>>>>> port_c_suspend:%d, resume_done:%d\n", >>>>>>> + temp, >>>>>>> + bus_state->port_c_suspend, >>>>>>> + bus_state->resume_done[i]); >>>>>>> } >>>>>>> if ((temp & PORT_RC)) >>>>>>> reset_change = true; >>>>>>> >>>>>>> >>>>>>> and run with 4.9 and 4.10-rc >>>>>>> >>>>>>> -Mathias >>>>>> >>>>>> >>>>>> Hi Mathias >>>>>> >>>>>> Thanks for your time >>>>>> >>>>>> For information i replaced xhci_err() by xhci_dbg() to avoid flooding >>>>>> kernel log which >>>>>> avoid to complete the kernel boot. >>>>>> >>>>>> Please find attached both kernel log files for v4.9 and v4.10-rc1. >>>>>> >>>>>> In v4.9_xhci-hub_additionnal_debug.txt, at the beginning of the trace >>>>>> portsc value is 0xa400340, then at [ 18.366392] it becomes 0x400340 >>>>>> (link state change) and then 0x2802a0 and back again to 0xa400340 >>>>>> >>>>>> In v4.10-rc1_xhci-hub_additionnal_debug.txt, at the beginning of the >>>>>> trace >>>>>> portsc value is 0xa400340, then at [ 5.557976] it becomes 0x400340 >>>>>> (link state change), and then 0x2802a0 or 0x680340 for ever .... >>>>>> >>>>>> Patrice >>>>>> >>>>> >>>>> Hi Mathias >>>>> >>>>> Did new logs give you some clue about the XHCi problem ? >>>>> >>>> >>>> Only that both cases, 4.9 and 4.10, start with port in compliance state >>>> in a >>>> auto suspend/resume. At some point the hub thread will see an event for >>>> that >>>> port and try to reset it. fail (as there is no device) and disable the >>>> port. >>>> >>>> >>>> log snippet of busylooping between autosuspend and resume, similar for >>>> both 4.9 and >>>> 4.10: >>>> >>>> [ 18.329121] hub 6-0:1.0: hub_suspend >>>> [ 18.329175] usb usb6: bus auto-suspend, wakeup 1 >>>> [ 18.329193] xhci-hcd xhci-hcd.0.auto: portsc 0xa400340, >>>> port_c_suspend:0, resume_done:0 >>>> [ 18.329202] usb usb6: suspend raced with wakeup event >>>> [ 18.329209] usb usb6: usb auto-resume >>>> [ 18.332758] xhci-hcd xhci-hcd.0.auto: portsc 0xa400340, >>>> port_c_suspend:0, resume_done:0 >>>> [ 18.344140] hub 6-0:1.0: hub_resume >>>> [ 18.344179] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x400340 >>>> [ 18.344191] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340 >>>> [ 18.344318] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0000 >>>> [ 18.344368] xhci-hcd xhci-hcd.0.auto: set port remote wake mask, >>>> actual port 0 status = 0xe400340 >>>> [ 18.344451] hub 6-0:1.0: hub_suspend >>>> >>>> >>>> In 4.9 we go back to the beginning, busylooping between auto suspend and >>>> resume, >>>> until after some random time when hub sees an event again, and continues >>>> all over again. >>>> >>>> log snippet of event, and resetting the port for 4.9: >>>> >>>> [ 18.367378] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002 >>>> [ 18.367417] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x400340 >>>> [ 18.367426] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340 >>>> [ 18.367805] usb usb6-port1: link state change >>>> [ 18.367846] xhci-hcd xhci-hcd.0.auto: clear port link state change, >>>> actual port 0 status = 0x340 >>>> [ 18.367943] usb usb6-port1: do warm reset >>>> [ 18.368043] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>> [ 18.428313] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x2b0 >>>> [ 18.428330] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2b0 >>>> [ 18.428475] usb usb6-port1: not warm reset yet, waiting 50ms >>>> [ 18.468135] xhci-hcd xhci-hcd.0.auto: Port Status Change Event for >>>> port 2 >>>> [ 18.468155] xhci-hcd xhci-hcd.0.auto: handle_port_status: starting >>>> port polling. >>>> [ 18.468167] xhci-hcd xhci-hcd.0.auto: portsc 0x2802a0, >>>> port_c_suspend:0, resume_done:0 >>>> [ 18.487840] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x2802e0 >>>> [ 18.487855] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x3002e0 >>>> [ 18.488636] xhci-hcd xhci-hcd.0.auto: clear port reset change, actual >>>> port 0 status = 0x802e0 >>>> [ 18.488736] xhci-hcd xhci-hcd.0.auto: clear port warm(BH) reset >>>> change, actual port 0 status = 0x2e0 >>>> [ 18.488803] xhci-hcd xhci-hcd.0.auto: clear port link state change, >>>> actual port 0 status = 0x2e0 >>>> [ 18.488869] xhci-hcd xhci-hcd.0.auto: clear port connect change, >>>> actual port 0 status = 0x2e0 >>>> [ 18.489012] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x2e0 >>>> [ 18.489026] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2e0 >>>> [ 18.489112] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x2e0 >>>> [ 18.489122] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2e0 >>>> [ 18.489182] xhci-hcd xhci-hcd.0.auto: Disable port 0 >>>> [ 18.489239] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x280 >>>> [ 18.489247] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x280 >>>> [ 18.489296] xhci-hcd xhci-hcd.0.auto: Enable port 0 >>>> [ 18.489357] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002 >>>> [ 18.489386] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x2a0 >>>> [ 18.489395] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2a0 >>>> [ 18.489489] xhci-hcd xhci-hcd.0.auto: set port remote wake mask, >>>> actual port 0 status = 0xe0002a0 >>>> [ 18.489547] hub 6-0:1.0: hub_suspend >>>> [ 18.489606] usb usb6: bus auto-suspend, wakeup 1 >>>> [ 18.489624] xhci-hcd xhci-hcd.0.auto: xhci_hub_status_data: stopping >>>> port polling. >>>> [ 18.522772] xhci-hcd xhci-hcd.0.auto: xhci_hub_status_data: stopping >>>> port polling. >>>> [ 18.543748] Link is Up - 1000/Full >>>> [ 18.863014] xhci-hcd xhci-hcd.0.auto: Port Status Change Event for >>>> port 2 >>>> [ 18.863033] xhci-hcd xhci-hcd.0.auto: resume root hub >>>> [ 18.863073] xhci-hcd xhci-hcd.0.auto: handle_port_status: starting >>>> port polling. >>>> [ 18.863089] xhci-hcd xhci-hcd.0.auto: portsc 0xa400340, >>>> port_c_suspend:0, resume_done:0 >>>> [ 18.863333] usb usb6: usb wakeup-resume >>>> [ 18.863355] usb usb6: usb auto-resume >>>> [ 18.863379] hub 6-0:1.0: hub_resume >>>> [ 18.863410] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x400340 >>>> [ 18.863419] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340 >>>> >>>> In 4.10 the port disable is a bit different, so after hub sees the first >>>> event (evt 0002) and tries to >>>> reset the port, it will notice the reset fails, and try to reset port >>>> several timer, then disable device, >>>> which will end up with another event (evt 0002) and we start the whole >>>> reset loop again. >>>> >>>> [ 5.558058] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002 >>>> [ 5.558093] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x400340 >>>> [ 5.558103] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x400340 >>>> [ 5.558187] usb usb6-port1: link state change >>>> [ 5.558221] xhci-hcd xhci-hcd.0.auto: clear port link state change, >>>> actual port 0 status = 0x340 >>>> [ 5.558278] usb usb6-port1: do warm reset >>>> [ 5.558437] systemd-journald[108]: /dev/kmsg buffer overrun, some >>>> messages lost. >>>> [ 5.623028] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x2b0 >>>> [ 5.623046] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x2b0 >>>> [ 5.623134] usb usb6-port1: not warm reset yet, waiting 50ms >>>> [ 5.658454] xhci-hcd xhci-hcd.0.auto: Port Status Change Event for >>>> port 2 >>>> [ 5.658471] xhci-hcd xhci-hcd.0.auto: handle_port_status: starting >>>> port polling. >>>> [ 5.658487] xhci-hcd xhci-hcd.0.auto: portsc 0x2802a0, >>>> port_c_suspend:0, resume_done:0 >>>> [ 5.683071] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x2802e0 >>>> [ 5.683088] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x3002e0 >>>> [ 5.683187] usb usb6-port1: not warm reset yet, waiting 200ms >>>> [ 5.777987] xhci-hcd xhci-hcd.0.auto: portsc 0x2802e0, >>>> port_c_suspend:0, resume_done:0 >>>> [ 5.894043] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x2802e0 >>>> [ 5.894061] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x3002e0 >>>> [ 5.894158] usb usb6-port1: not warm reset yet, waiting 200ms >>>> [ 6.027992] xhci-hcd xhci-hcd.0.auto: portsc 0x2802e0, >>>> port_c_suspend:0, resume_done:0 >>>> [ 6.103092] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x680340 >>>> [ 6.103110] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x700340 >>>> [ 6.103238] usb usb6-port1: not warm reset yet, waiting 200ms >>>> [ 6.277993] xhci-hcd xhci-hcd.0.auto: portsc 0x680340, >>>> port_c_suspend:0, resume_done:0 >>>> [ 6.313066] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x680340 >>>> [ 6.313085] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x700340 >>>> [ 6.313349] usb usb6-port1: not warm reset yet, waiting 200ms >>>> [ 6.313385] xhci-hcd xhci-hcd.0.auto: clear port reset change, actual >>>> port 0 status = 0x480340 >>>> [ 6.313464] xhci-hcd xhci-hcd.0.auto: clear port warm(BH) reset >>>> change, actual port 0 status = 0x400340 >>>> [ 6.313523] xhci-hcd xhci-hcd.0.auto: clear port link state change, >>>> actual port 0 status = 0x340 >>>> [ 6.313587] xhci-hcd xhci-hcd.0.auto: clear port connect change, >>>> actual port 0 status = 0x340 >>>> [ 6.313651] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x340 >>>> [ 6.313665] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x340 >>>> ..loop a few times, until >>>> [ 9.673761] usb usb6-port1: not enabled, trying warm reset again... >>>> [ 9.673771] usb usb6-port1: Cannot enable. Maybe the USB cable is bad? >>>> [ 9.680418] xhci-hcd xhci-hcd.0.auto: Cannot set link state. >>>> [ 9.722276] usb usb6-port1: cannot disable (err = -32) >>>> [ 9.754240] hub 6-0:1.0: state 7 ports 1 chg 0000 evt 0002 >>>> [ 9.754291] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x340 >>>> [ 9.754304] xhci-hcd xhci-hcd.0.auto: Get port status returned 0x340 >>>> [ 9.754419] usb usb6-port1: do warm reset >>>> [ 9.782987] xhci-hcd xhci-hcd.0.auto: xhci_hub_status_data: stopping >>>> port polling. >>>> [ 9.813087] xhci-hcd xhci-hcd.0.auto: get port status, actual port 0 >>>> status = 0x2b0 >>>> >>>> >>>> We could try things like clearing all port change flags in ports >>>> disable, and/or only move ports >>>> in xhci enabled state to U3 at port disable, from other states we would >>>> do like in 4.9, and >>>> move to RxDetect via disable. >>>> >>>> But this would not really solve the STi case, it would at best go back >>>> to 4.9 where the issue is >>>> better hidden and silently stay in the autosuspend/resume busyloop with >>>> port in Compliance mode. >>>> >>>> You really should focus on why the port link state is in compliance in >>>> the first place. >>>> >>>> -Mathias >>> >>> Hi Mathias >>> >>> +Felipe Balbi >>> >>> I did further test on previous kernel release. >>> On v4.7, everything was ok , first issue appeared with v4.8-rc1 kernel. >>> >>> After bisection, the issue was introduced by >>> c499ff71ff usb: dwc3: core: re-factor init and exit paths >>> >>> If dwc3_phy_setup() call is moved back in dwc3_probe() instead of >>> dwc3_core_init(), everything is ok on STMicroelectronics platforms. >>> >>> I don't know if this issue has been seen on another platform. >>> >>> I can send a fix for this, but i wanted to get Felipe opinion on this. >> >> That really shouldn't cause any issues for xHCI. We're initializing the >> PHYs a little later, but still way before xHCI has a chance of >> probing. Are you sure your bisection is sound? > > Hi Felipe > > I confirm the bisection was correct > >> Which PHY driver are you > > The phy used is drivers/phy/phy-miphy28lp.c. > >> using? Maybe you have some quirk flag missing? >> > > Checking the available quirk flags for snps dwc3 and > re-reading what Mathias indicated me, he noticed that > xhci try to set link in U3 and returns an error, > i saw that "snps,dis_u3_susphy_quirk" could help. > > I then set "snps,dis_u3_susphy_quirk" in our board DT which fix > the issue. > > Felipe, thanks for the tips > Mathias, thanks for your time ;-) John, another instance of SUSPHY bit messing up with host side :-s -- balbi
Attachment:
signature.asc
Description: PGP signature