Re: [PATCH 0/2] xhci: Fix the NEC stop bug workaround

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 28 Oct 2024 11:54:39 +0200, Mathias Nyman wrote:
> The SET_DEQ_PENDING case is trickier. We would read the dequeue
> pointer from hardware while we know hardware is processing a command
> to move the dequeue pointer. Result may be old dequeue, or new
> dequeue, possible unknown.

Damn, I looked at various things but I haven't thought about it. Yes,
it's dodgy and not really a great idea. Although I suspect it wouldn't
be *very* harmful, because the truly bad case (failing to queue a Set
TR Deq when it's necessary) is triggered by Set TR Deq already pending
on the same stream, so the stream's cache will be cleared anyway.

But it could easily queue a bunch of pointless commands, for example.

By the way, I think this race is already possible today, without my
patches. There is no testing for SET_DEQ_PENDING in xhci_urb_dequeue()
and no testing in handle_cmd_stop_ep(). If this happens in the middle
of a Set TR Deq chain on a streams endpoint, nothing seems to stop the
Stop EP handler from attempting invalidation under SET_DEQ_PENDING.

Maybe invalidate_cancelled_tds() should bail out if SET_DEQ_PENDING
and later Set Deq completion handler should unconditionally call the
invalidate/giveback combo before it exits.

> We could ring the doorbell before giving back the invalidated tds in
> xhci_handle_cmd_stop_ep(), and possibly xhci_handle_cmd_set_deq().
> This gives hardware a bit more time to start the endpoint.

This unfortunately doesn't buy much time, because giveback is a very
cheap operation - it just adds the URBs to a queue and wakes a worker
which runs all those callbacks. It finishes under 1us on my system.

> We could also track pending ring starts.
> Set a "EP_START_PENDING flag when doorbell is rung on a stopped
> endpoint. clear the flag when firt transfer event is received on that
> endpoint.

Yes, that was the second thing I tried. But I abandoned it:

Problem 1: URBs on "idle" devices are cancelled before generating
any event, so we need to clear the flag on Stop EP and Reset EP.
Not all of them use the default completion handler. Maybe it could
be handled reliably by tapping into handle_cmd_completion(). But...

Problem 2: hardware bugs. On ASMedia 3142, I can trigger (from
userspace) a condition when EP 0 doorbell is rung (even twice)
and its state is still Stopped a few seconds later, and remains
so after repeated Stop EP / doorbell rings.

It looks like a timeout is the only way to be really sure.


Regards,
Michal




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux