On Mon, 28 Oct 2024 11:54:39 +0200, Mathias Nyman wrote: > The SET_DEQ_PENDING case is trickier. We would read the dequeue > pointer from hardware while we know hardware is processing a command > to move the dequeue pointer. Result may be old dequeue, or new > dequeue, possible unknown. Damn, I looked at various things but I haven't thought about it. Yes, it's dodgy and not really a great idea. Although I suspect it wouldn't be *very* harmful, because the truly bad case (failing to queue a Set TR Deq when it's necessary) is triggered by Set TR Deq already pending on the same stream, so the stream's cache will be cleared anyway. But it could easily queue a bunch of pointless commands, for example. By the way, I think this race is already possible today, without my patches. There is no testing for SET_DEQ_PENDING in xhci_urb_dequeue() and no testing in handle_cmd_stop_ep(). If this happens in the middle of a Set TR Deq chain on a streams endpoint, nothing seems to stop the Stop EP handler from attempting invalidation under SET_DEQ_PENDING. Maybe invalidate_cancelled_tds() should bail out if SET_DEQ_PENDING and later Set Deq completion handler should unconditionally call the invalidate/giveback combo before it exits. > We could ring the doorbell before giving back the invalidated tds in > xhci_handle_cmd_stop_ep(), and possibly xhci_handle_cmd_set_deq(). > This gives hardware a bit more time to start the endpoint. This unfortunately doesn't buy much time, because giveback is a very cheap operation - it just adds the URBs to a queue and wakes a worker which runs all those callbacks. It finishes under 1us on my system. > We could also track pending ring starts. > Set a "EP_START_PENDING flag when doorbell is rung on a stopped > endpoint. clear the flag when firt transfer event is received on that > endpoint. Yes, that was the second thing I tried. But I abandoned it: Problem 1: URBs on "idle" devices are cancelled before generating any event, so we need to clear the flag on Stop EP and Reset EP. Not all of them use the default completion handler. Maybe it could be handled reliably by tapping into handle_cmd_completion(). But... Problem 2: hardware bugs. On ASMedia 3142, I can trigger (from userspace) a condition when EP 0 doorbell is rung (even twice) and its state is still Stopped a few seconds later, and remains so after repeated Stop EP / doorbell rings. It looks like a timeout is the only way to be really sure. Regards, Michal