On Mon, Dec 19, 2022 at 07:31:02PM +0100, Ladislav Michl wrote: > On Mon, Dec 19, 2022 at 02:25:46PM +0200, Mathias Nyman wrote: > > On 16.12.2022 23.32, Ladislav Michl wrote: > > > On Fri, Dec 16, 2022 at 12:13:23PM +0200, Mathias Nyman wrote: > > > > On 15.12.2022 18.12, Ladislav Michl wrote: > > > > > +Cc Mathias as he last touched this code path and may know more :) > > > > > > > > > > On Tue, Dec 06, 2022 at 02:17:08PM +0100, Ladislav Michl wrote: > > > > > > On Mon, Dec 05, 2022 at 10:27:57PM +0100, Ladislav Michl wrote: > > > > > > > I'm running current linux.git on custom Marvell OCTEON III CN7020 > > > > > > > based board. USB devices like FTDI (idVendor=0403, idProduct=6001, > > > > > > > bcdDevice= 6.00) Realtek WiFi dongle (idVendor=0bda, idProduct=8179, > > > > > > > bcdDevice= 0.00) works without issues, while Ralink WiFi dongle > > > > > > > (idVendor=148f, idProduct=5370, bcdDevice= 1.01) kills the host on > > > > > > > disconnect: > > > > > > > xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command > > > > > > > xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead > > > > > > > xhci-hcd xhci-hcd.0.auto: HC died; cleaning up > > > > > > > > > > > > > > Unfortunately I do not have a datasheet for CN7020 SoC, so it is hard > > > > > > > to tell if there is any errata :/ In case anyone see a clue in debug > > > > > > > logs bellow, I'll happily give it a try. > > > > > > > > > > > > So I do have datasheet now. As a wild guess I tried to use dlmc_ref_clk0 > > > > > > instead of dlmc_ref_clk1 as a refclk-type-ss and it fixed unplug death. > > > > > > I have no clue why, but anyway - sorry for the noise :) Perhaps Octeon's > > > > > > clock init is worth to be verified... > > > > > > > > > > After all whenever xhci dies with "xHCI host not responding to stop endpoint > > > > > command" depends also on temperature, so there seems to be race somewhere. > > > > > > > > > > As a quick and dirty verification, whenever xhci really died, following patch > > > > > was tested and it fixed issue. It just treats ep as if stop endpoint command > > > > > succeeded. Any clues? I'll happily provide more traces. > > > > > > > > It's possible the controller did complete the stop endpoint command but driver > > > > didn't get the interrupt for the event for some reason. > > > > > > > > Looks like controller didn't complete the stop endpoint command. > > > > Event for last completed command (before cycle bit change "c" -> "C") was: > > 0x00000000028f55a0: TRB 00000000035e81a0 status 'Success' len 0 slot 1 ep 0 type 'Command Completion Event' flags e:c, > > > > This was for command at 35e81a0, which in the command ring was: > > 0x00000000035e81a0: Reset Endpoint Command: ctx 0000000000000000 slot 1 ep 3 flags T:c > > > > The stop endpoint command was the next command queued, at 35e81b0: > > 0x00000000035e81b0: Stop Ring Command: slot 1 sp 0 ep 3 flags c > > > > There were a lot of URBs queued for this device, and they are cancelled one by one after disconnect. > > > > Was this the only device connected? If so does connecting another usb device to another root port help? > > Just to test if the host for some reason partially stops a while after last device disconnect? > > Device is connected directly into SoC. Once connected into HUB, host doesn't die > (as noted in other email, sorry for not replying to my own message, so it got lost) > It seems as intentional (power management?) optimization. If another device is > plugged in before 5 sec timeout expires, host completes stop endpoint command. > > Unfortunately I cannot find anything describing this behavior in > documentation, so I'll ask manufacturer support. As support is usually slow I asked search engine first and this sounds familiar: "Synopsis Designware USB3 IP earlier than v3.00a which is configured in silicon with DWC_USB3_SUSPEND_ON_DISCONNECT_EN=1, would need a specific quirk to prevent xhci host controller from dying when device is disconnected." usb: dwc3: Add quirk for Synopsis device disconnection errata https://patchwork.kernel.org/project/linux-omap/patch/1424151697-2084-5-git-send-email-Sneeker.Yeh@xxxxxxxxxxxxxx/ Any clue what happened with that? I haven't found any meaningfull traces... > Both solutions, do nothing or reset controller once last device is unpluged > works, but I doubt they are suitable for mainline kernel without further > investigation. > > > Another thing is that the stop endpoint command fails after three soft reset tries, > > does disabling soft reset help? > > No, this does not cause any change. > > ladis