[PATCH 00/16] squeeze more performance

Pavel Begunkov <asml.silence@xxxxxxxxx> · Mon, 4 Oct 2021 20:02:45 +0100

fio/t/io_uring -s32 -d32 -c32 -N1

          | baseline  | 0-15      | 0-16        | diff
setup 1:  | 34 MIOPS  | 42 MIOPS  | 42.2  MIOPS | 25 %
setup 2:  | 31 MIOPS  | 31 MIOPS  | 32    MIOPS | ~3 $

Setup 1 gets 25% performance improvement, which is unexpected and a
share of it should be accounted as compiler/HW magic. Setup 2 is just
3%, but the catch is that some of the patches _very_ unexpectedly sink
performance, so it's more like 31 MIOPS -> 29 -> 30 -> 29 -> 31 -> 32

I'd suggest to leave 16/16 aside, maybe for future consideration and
refinement. The end result is not very clear, I'd expect probably
around 3-5% with a more stable setup for nops32, and a better win
for io_cqring_ev_posted() intensive cases like BPF.

Pavel Begunkov (16):
  io_uring: optimise kiocb layout
  io_uring: add more likely/unlikely() annotations
  io_uring: delay req queueing into compl-batch list
  io_uring: optimise request allocation
  io_uring: optimise INIT_WQ_LIST
  io_uring: don't wake sqpoll in io_cqring_ev_posted
  io_uring: merge CQ and poll waitqueues
  io_uring: optimise ctx referencing by requests
  io_uring: mark cold functions
  io_uring: optimise io_free_batch_list()
  io_uring: control ->async_data with a REQ_F flag
  io_uring: remove struct io_completion
  io_uring: inline io_req_needs_clean()
  io_uring: inline io_poll_complete
  io_uring: correct fill events helpers types
  io_uring: mark hot functions

 fs/io-wq.h    |   1 -
 fs/io_uring.c | 390 ++++++++++++++++++++++++++------------------------
 2 files changed, 205 insertions(+), 186 deletions(-)

-- 
2.33.0