[PATCH for-next v2 00/12] CQ locking optimisation

Pavel Begunkov <asml.silence@xxxxxxxxx> · Wed, 7 Dec 2022 03:53:25 +0000

Optimise CQ locking for event posting depending on a number of ring setup flags.
QD1 nop benchmark showed 12.067 -> 12.565 MIOPS increase, which more than 8.5%
of the io_uring kernel overhead (taking into account that the syscall overhead
is over 50%) or 4.12% of the total performance. Naturally, it's not only about
QD1, applications can submit a bunch of requests but their completions will may
arrive randomly hurting batching and so performance (or latency).

The downside is that we have to punt all io-wq completions to the
original task. The performance win should diminish with better
completion batching, but it should be worth it for as it also helps tw,
which in reality often don't complete too many requests.

The feature depends on DEFER_TASKRUN but can be relaxed to SINGLE_ISSUER

v2: some general msg_ring fixes (patche 1,2)
    fix exiting ring potentially modifying CQ in parallel (8/12)
    use task_work instead of overflowing msg_ring CQEs, which could've
      messed with CQE ordering (9-11)

Pavel Begunkov (12):
  io_uring: dont remove file from msg_ring reqs
  io_uring: improve io_double_lock_ctx fail handling
  io_uring: skip overflow CQE posting for dying ring
  io_uring: don't check overflow flush failures
  io_uring: complete all requests in task context
  io_uring: force multishot CQEs into task context
  io_uring: use tw for putting rsrc
  io_uring: never run tw and fallback in parallel
  io_uring: get rid of double locking
  io_uring: extract a io_msg_install_complete helper
  io_uring: do msg_ring in target task via tw
  io_uring: skip spinlocking for ->task_complete

 include/linux/io_uring.h       |   2 +
 include/linux/io_uring_types.h |   3 +
 io_uring/io_uring.c            | 165 ++++++++++++++++++++++-----------
 io_uring/io_uring.h            |  12 ++-
 io_uring/msg_ring.c            | 163 ++++++++++++++++++++++----------
 io_uring/msg_ring.h            |   1 +
 io_uring/net.c                 |  21 +++++
 io_uring/opdef.c               |   8 ++
 io_uring/opdef.h               |   2 +
 io_uring/rsrc.c                |  19 +++-
 io_uring/rsrc.h                |   1 +
 11 files changed, 291 insertions(+), 106 deletions(-)

-- 
2.38.1