Patch 1-5 optimise io_fill_cqe_req Patch 6-7 combine iopoll and normal completion paths Patch 8 should improve CPU caching of SQ/CQ pointers Patch 9 removes conditionally SQ indirection (->sq_array). Assuming we'll make it a default in liburing, Patch 10 optimises it with static_key. Patch 10-15 shuffle io_ring_ctx fields. Patch 16 inlines io_fill_cqe_req. Testing with t/io_uring nops only for now QD2 QD4 QD8 QD16 QD32 baseline: 17.3 26.6 36.4 43.7 49.4 Patches 1-15: 17.8 27.4 37.9 45.8 51.2 Patches 1-16: 17.9 28.2 39.3 47.8 54 L1 load misses decreased from 1.7% to 1.3%, I don't think it's significant and it will be more interesting to see how it looks when we do actual IO. Pavel Begunkov (16): io_uring: improve cqe !tracing hot path io_uring: cqe init hardening io_uring: simplify big_cqe handling io_uring: refactor __io_get_cqe() io_uring: optimise extra io_get_cqe null check io_uring: reorder cqring_flush and wakeups io_uring: merge iopoll and normal completion paths io_uring: compact SQ/CQ heads/tails io_uring: add option to remove SQ indirection io_uring: static_key for !IORING_SETUP_NO_SQARRAY io_uring: move non aligned field to the end io_uring: banish non-hot data to end of io_ring_ctx io_uring: separate task_work/waiting cache line io_uring: move multishot cqe cache in ctx io_uring: move iopoll ctx fields around io_uring: force inline io_fill_cqe_req include/linux/io_uring_types.h | 129 ++++++++++++++++---------------- include/uapi/linux/io_uring.h | 5 ++ io_uring/io_uring.c | 130 ++++++++++++++++++--------------- io_uring/io_uring.h | 58 +++++++-------- io_uring/rw.c | 24 ++---- io_uring/uring_cmd.c | 5 +- 6 files changed, 173 insertions(+), 178 deletions(-) -- 2.41.0