On 29.10.2024 11.16, Mathias Nyman wrote:
On 29.10.2024 10.28, Michał Pecio wrote:
By the way, I think this race is already possible today, without my
patches. There is no testing for SET_DEQ_PENDING in xhci_urb_dequeue()
and no testing in handle_cmd_stop_ep(). If this happens in the middle
of a Set TR Deq chain on a streams endpoint, nothing seems to stop the
Stop EP handler from attempting invalidation under SET_DEQ_PENDING.
Maybe invalidate_cancelled_tds() should bail out if SET_DEQ_PENDING
and later Set Deq completion handler should unconditionally call the
invalidate/giveback combo before it exits.
I think you are on to something.
If we add invalidate/givaback to Set TR deq completion handler, allowing
it to possible queue new Set TR Deq commands, then we can bail out in
xhci_urb_dequeue() if SET_DEQ_PENDING is set.
xhci_urb_dequeue() would not queue a extra stop endpoint command, only
set td->cancel_status to TD_DIRTY dirty, and Set TR Deq handler would
not ring the doorbell unnecessary.
Sounds like a plan to ne.
I wrote a testseries for this.
1st patch avoids stopping endpoint in urb cancel if Set TR Deq is pending
2nd patch handles Set TR Deq command ctx error due to running ep.
3rd patch tracks doorbell ring with a flag. It's for now only used to prevent
infinite stop ep retries. Flag is not properly cleared in other cases.
Series can be found in my tree in a fix_stop_ep_race branch:
https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=fix_stop_ep_race
git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git fix_stop_ep_race branch
Do these help in your NEC host case?
I'll see if I can set up some system to trigger this myself
Thanks
Mathias