Patch 1-5 optimise io_fill_cqe_req Patch 6-7 combine iopoll and normal completion paths Patch 8 inlines io_fill_cqe_req. Patch 9 should improve CPU caching of SQ/CQ pointers Patch 10 removes conditionally SQ indirection (->sq_array). Assuming we'll make it a default in liburing, Patch 10 optimises it with static_key. Patch 11-15 shuffle io_ring_ctx fields. Testing with t/io_uring nops only for now QD2 QD4 QD8 QD16 QD32 baseline: 17.3 26.6 36.4 43.7 49.4 Patches 1-15: 17.8 27.4 37.9 45.8 51.2 Patches 1-16: 17.9 28.2 39.3 47.8 54 v2: removed static_key, it'll be submitted later after it rolls out well minor description changes Pavel Begunkov (15): io_uring: improve cqe !tracing hot path io_uring: cqe init hardening io_uring: simplify big_cqe handling io_uring: refactor __io_get_cqe() io_uring: optimise extra io_get_cqe null check io_uring: reorder cqring_flush and wakeups io_uring: merge iopoll and normal completion paths io_uring: force inline io_fill_cqe_req io_uring: compact SQ/CQ heads/tails io_uring: add option to remove SQ indirection io_uring: move non aligned field to the end io_uring: banish non-hot data to end of io_ring_ctx io_uring: separate task_work/waiting cache line io_uring: move multishot cqe cache in ctx io_uring: move iopoll ctx fields around include/linux/io_uring_types.h | 129 +++++++++++++++++---------------- include/uapi/linux/io_uring.h | 5 ++ io_uring/io_uring.c | 120 +++++++++++++++--------------- io_uring/io_uring.h | 58 +++++++-------- io_uring/rw.c | 24 ++---- io_uring/uring_cmd.c | 5 +- 6 files changed, 163 insertions(+), 178 deletions(-) -- 2.41.0