On Fri, Dec 04, 2020 at 08:07:30PM +0200, Mathias Nyman wrote: <> > Ok, thanks. > > Then the rootcause remains unknown. > For some reason the endpoint context dequeue pointer field contains zero > instead of the new dequeue pointer. > The (output) endpoint context is supposed to be written only by the controller. > > Time to change strategy and start to detect and treat the symptoms instead. > > I wrote a patch that detects the 0-dequeue pointer and issues a > new Set TR Deq pointer command. Hopefully that works. > patch added to same branch, can you try it out? > > 3f6326766abc xhci: retry setting new dequeue if xHC hardware failed to update it > > I didn't set a retry limit yet so if it doesn't work it might retry forever. Here are some logs when running with that commit: https://gist.github.com/rzwisler/17923c9dedf2b914254eadd1cd294a4c I think we only consistently get the clean failure case with the dequeue pointer being 0 if CONFIG_INTEL_IOMMU_DEFAULT_ON=y. If that option is set to 'n', we get the same failure where the xHCI controller totally dies (log "CONFIG_INTEL_IOMMU_DEFAULT_ON=n" in the gist). With CONFIG_INTEL_IOMMU_DEFAULT_ON=y we do seem to live through multiple errors, but as soon as I try to use the device normally afterwards it seems to spin forever with these messages: xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fff0a330 trb-start 00000000f8884000 trb-end 0000000000000000 seg-start 00000000f8884000 seg-end 00000000f8884ff0 Are you able to reproduce this with Andrzej's bulk-cancel script? I think you probably just need a device which accepts bulk transfer commands? In my most recent reproductions my servo hardware wasn't even attached to a device, so I don't really think it's doing anything except sitting there and receiving BULK_IN commands. I'm doing this to two devices simultaneously.