On 6/16/22 17:21, Pavel Begunkov wrote:
1-4 kills REQ_F_COMPLETE_INLINE as we're out of bits. Patch 5 from Hao should remove some overhead from poll requests Patch 6 from Hao adds per-bucket spinlocks, and 16-19 do a little bit of cleanup. The downside of per-bucket spinlocks is that it adds additional spinlock/unlock pair in the poll request completion side, which shouldn't matter much with 20/25. Patch 11 uses inline completion infra for poll requests, this nicely improves perf when there is a good tw batching. Patch 12 implements the userspace visible side of IORING_SETUP_SINGLE_ISSUER, it'll be used for poll requests and later for spinlock optimisations. 13-16 introduces ->uring_lock protected cancellation hashing. It requires us to grab ->uring_lock in the completion side, but saves two spin lock/unlock pairs. We apply it automatically in cases the mutex is already likely to be held (see 25/25 description), so there is no additional mutex overhead and potential latency problemes.
Reviewed-by: Hao Xu <howeyxu@xxxxxxxxxxx>