Re: xhci problem -> general protection fault

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 07, 2021 at 10:57:19AM +0200, Mathias Nyman wrote:
> On 6.1.2021 20.52, Ross Zwisler wrote:
> > On Wed, Dec 30, 2020 at 5:31 AM Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> wrote:
> > <>
> >> I was able to reproduce the issue with two USB devices on a different system.
> >>
> >> I saw that the new incorrect dequeue pointer sometimes had the higher 32 bits
> >> incorrect while the lower bits were correct.
> >> So this looks like a memory access order issue.
> >>
> >> The command TRB is 16 bytes. The xhci driver writes it in four 4 byte chunks.
> >> Even if the driver writes the chunk that sets the cycle bit last, handing the TRB
> >> over to the controller, it appears that the actual write order can be different.
> >> The controller ends up reading a command TRB with updated cycle bit but old bogus
> >> values in the "new dequeue pointer" field.
> >>
> >> Adding a write memory barrier before writing the cycle bit solved the issue in my case.
> >>
> >> Whole patch series is updated, added write memory barrier, and rebased on 5.10.
> >> It can be found force-updated in the same rewrite_halt_stop_handling branch:
> >>
> >> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git rewrite_halt_stop_handling
> >> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=rewrite_halt_stop_handling
> >>
> >> Does this work for you?
> > 
> > Yes, it does!!  I verified that I'm able to reproduce the issue in less than
> > 10 seconds with this commit which is the HEAD~1 of your series:
> > 
> > a7d053d207121 xhci: handle halting transfer event properly after endpoint stop and halt raced.
> > 
> > With this commit (the final commit in your series, adding a single patch): 
> > 
> > 96887d191a88c xhci: make sure TRB is fully written before giving it to the controller
> > 
> > I ran cleanly for over 20 minutes and haven't been able to reproduce the
> > issue.  It looks like the wmb() added in that patch makes all the difference!
> > 
> > Thank you for the fix, and you can add this tag to the series:
> > 
> > Tested-by: Ross Zwisler <zwisler@xxxxxxxxxx>
> > 
> 
> Great, thanks for testing.
> 
> I think it makes sense to cherry pick that one-liner wmb() patch first to 5.11 and stable,
> then add rest of the rewrite later.
> 
> -Mathias

Makes sense to me.  FWIW I've verified that v5.10 reproduces the issue easily,
and v5.10 with just the wmb() patch does not.

Thanks again for the fix!



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux