On Wed, Dec 30, 2020 at 5:31 AM Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> wrote: <> > I was able to reproduce the issue with two USB devices on a different system. > > I saw that the new incorrect dequeue pointer sometimes had the higher 32 bits > incorrect while the lower bits were correct. > So this looks like a memory access order issue. > > The command TRB is 16 bytes. The xhci driver writes it in four 4 byte chunks. > Even if the driver writes the chunk that sets the cycle bit last, handing the TRB > over to the controller, it appears that the actual write order can be different. > The controller ends up reading a command TRB with updated cycle bit but old bogus > values in the "new dequeue pointer" field. > > Adding a write memory barrier before writing the cycle bit solved the issue in my case. > > Whole patch series is updated, added write memory barrier, and rebased on 5.10. > It can be found force-updated in the same rewrite_halt_stop_handling branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git rewrite_halt_stop_handling > https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=rewrite_halt_stop_handling > > Does this work for you? Yes, it does!! I verified that I'm able to reproduce the issue in less than 10 seconds with this commit which is the HEAD~1 of your series: a7d053d207121 xhci: handle halting transfer event properly after endpoint stop and halt raced. With this commit (the final commit in your series, adding a single patch): 96887d191a88c xhci: make sure TRB is fully written before giving it to the controller I ran cleanly for over 20 minutes and haven't been able to reproduce the issue. It looks like the wmb() added in that patch makes all the difference! Thank you for the fix, and you can add this tag to the series: Tested-by: Ross Zwisler <zwisler@xxxxxxxxxx>