24 MIOPS vs 31.5, or ~30% win for fio/t/io_uring nops batching=32 Jens mentioned that with his standard test against Optane it gave yet another +3% to throughput. 1-14 are about optimising the completion path: - replaces lists with single linked lists - kills 64 * 8B of caches in ctx - adds some shuffling of iopoll bits - list splice instead of per-req list_add in one place - inlines io_req_free_batch() and other helpers 15-22: inlines __io_queue_sqe() so all the submission path up to io_issue_sqe() is inlined + little tweaks v2: rebase for-5.16/io_uring multicqe_drain was hanging because it's a bit buggy, i.e. doesn't consider that requests may get punted, but still add 24th patch to avoid it. Pavel Begunkov (24): io_uring: mark having different creds unlikely io_uring: force_nonspin io_uring: make io_do_iopoll return number of reqs io_uring: use slist for completion batching io_uring: remove allocation cache array io-wq: add io_wq_work_node based stack io_uring: replace list with stack for req caches io_uring: split iopoll loop io_uring: use single linked list for iopoll io_uring: add a helper for batch free io_uring: convert iopoll_completed to store_release io_uring: optimise batch completion io_uring: inline completion batching helpers io_uring: don't pass tail into io_free_batch_list io_uring: don't pass state to io_submit_state_end io_uring: deduplicate io_queue_sqe() call sites io_uring: remove drain_active check from hot path io_uring: split slow path from io_queue_sqe io_uring: inline hot path of __io_queue_sqe() io_uring: reshuffle queue_sqe completion handling io_uring: restructure submit sqes to_submit checks io_uring: kill off ->inflight_entry field io_uring: comment why inline complete calls io_clean_op() io_uring: disable draining earlier fs/io-wq.h | 60 +++++- fs/io_uring.c | 508 +++++++++++++++++++++++--------------------------- 2 files changed, 287 insertions(+), 281 deletions(-) -- 2.33.0