On 27.8.2024 20.46, Michal Pecio wrote:
Hi, Here's a handful of bug fixes, cleanups and improvements for the xHCI driver. The first is a trivial fix, the second also fixes a bug albeit a less trivial one. These two maybe deserve to go into usb-linus, the latter perhaps to stable as well, but the patch doesn't apply as-is. The rest are new functionality or code cleanups with low anticipated user impact. All patches were applied and tested on v6.11-rc5, all compiled cleanly and worked without regressions with some HID, storage, audio, video. Functional changes received additional functional testing described in their respective changelogs. Notably missing is a solution to the "Underrun after Missed Service" problem. To recap, Underrun/Overrun typically has a NULL TRB pointer and ep_ring->td_list contains some missed TDs and possibly a few that have been added after the underrun occured but before we got the IRQ. No way to tell them apart, at least by the spec, as far as I see. On USB 3.1+ hosts we can get out of it with "expedited skipping" - it ensures that the ring is never left with stale TDs in the first place. On USB 3.0 hosts the underrun handler *will* sometimes face skipping and it needs to make a decision. Currently, it skips everything. This may cause DMA-after-free - not great on IN endpoints - and "TRB DMA ptr not part of current TD" or "WARN Event TRB with no TDs queued" later. The obvious alterantive is to never skip on empty ring events, but it can deadlock a driver which waits for its URBs to be given back when all of them were missed and we can't get a valid dequeue from the HC. I wonder if it would make sense to always skip the first queued TD when we get an MSE with NULL pointer? I think it's legal, and likely sufficient to avoid the stupid deadlock. Just a last minute idea. I will think about it, but now I'd like to flush this patch queue before it grows to infinity ;)
Thanks Michal for this series. I need to go through the patches individually and pick fixes. We are in the middle of reworking transfer event handling with Niklas as well. I'd be grateful if you could take a quick look at our solution and give your opinion on it as you have expertise in this area. Main idea is simple. If a transfer event points to a TRB between dequeue and enqueue we give back all queued TDs passed up to this TRB. And if the event points to a valid TD then handle that. (normal case) Code is much simpler, we get rid of skip flag, and could probably merge error_mid_td and urb_length_set flags into one flag. The code isn't complete or tested yet. It doesn't handle under/overruns, but that could probably be fixed by just turning "return 0" for those cases into "break" Code on top of 6.11-rc4 can be found in my feature_transfer_event_rework branch: git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git feature_transfer_event_rework https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=feature_transfer_event_rework Thanks Mathias