On 4/12/22 9:05 AM, Jens Axboe wrote: > On 4/12/22 8:09 AM, Pavel Begunkov wrote: >> nops benchmark: 40.3 -> 41.1 MIOPS, or +2% >> >> Pavel Begunkov (9): >> io_uring: explicitly keep a CQE in io_kiocb >> io_uring: memcpy CQE from req >> io_uring: shrink final link flush >> io_uring: inline io_flush_cached_reqs >> io_uring: helper for empty req cache checks >> io_uring: add helper to return req to cache list >> io_uring: optimise submission loop invariant >> io_uring: optimise submission left counting >> io_uring: optimise io_get_cqe() >> >> fs/io_uring.c | 288 +++++++++++++++++++++++++++++--------------------- >> 1 file changed, 165 insertions(+), 123 deletions(-) > > Get about ~4% on aarch64. I like both main changes, memcpy of cqe and > the improvements to io_get_cqe(). Ran the nop tests on the 12900K, and I see about an 8% improvement there, going from ~88M to 95M. I didn't split and check which part made the most improvement. -- Jens Axboe