Re: [PATCH v5 0/6] task work optimization

Pavel Begunkov <asml.silence@xxxxxxxxx> · Wed, 24 Nov 2021 21:41:10 +0000

On 11/24/21 12:21, Hao Xu wrote:
v4->v5
- change the implementation of merge_wq_list

They only concern I had was about 6/6 not using inline completion
infra, when it's faster to grab ->uring_lock. i.e.
io_submit_flush_completions(), which should be faster when batching
is good.

Looking again through the code, the only user is SQPOLL

io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL));

And with SQPOLL the lock is mostly grabbed by the SQPOLL task only,
IOW for pure block rw there shouldn't be any contention.
Doesn't make much sense, what am I missing?
How many requests are completed on average per tctx_task_work()?

It doesn't apply to for-5.17/io_uring, here is a rebase:
https://github.com/isilence/linux.git haoxu_tw_opt
link: https://github.com/isilence/linux/tree/haoxu_tw_opt

With that first 5 patches look good, so for them:
Reviewed-by: Pavel Begunkov <asml.silence@xxxxxxxxx>

but I still don't understand how 6/6 is better. Can it be because of
indirect branching? E.g. would something like this give the result?

- req->io_task_work.func(req, locked);
+ INDIRECT_CALL_1(req->io_task_work.func, io_req_task_complete, req, locked);

Hao Xu (6):
   io-wq: add helper to merge two wq_lists
   io_uring: add a priority tw list for irq completion work
   io_uring: add helper for task work execution code
   io_uring: split io_req_complete_post() and add a helper
   io_uring: move up io_put_kbuf() and io_put_rw_kbuf()
   io_uring: batch completion in prior_task_list

  fs/io-wq.h    |  22 +++++++
  fs/io_uring.c | 158 +++++++++++++++++++++++++++++++++-----------------
  2 files changed, 128 insertions(+), 52 deletions(-)

--
Pavel Begunkov