io_uring extensively uses task_work, but when a task is waiting for multiple CQEs it causes lots of rescheduling. This series is an attempt to optimise it and be a base for future improvements. For some zc network tests eventually waiting for a portion of buffers I've got 10x descrease in the number of context switches, which reduced the CPU consumption more than twice (17% -> 8%). It also helps storage cases, while running fio/t/io_uring against a low performant drive it got 2x descrease of the number of context switches for QD8 and ~4 times for QD32. Not for inclusion yet, I want to add an optimisation for when waiting for 1 CQE. Pavel Begunkov (2): io_uring: add tw add flags io_uring: reduce sheduling due to tw include/linux/io_uring_types.h | 2 +- io_uring/io_uring.c | 48 ++++++++++++++++++++-------------- io_uring/io_uring.h | 10 +++++-- io_uring/notif.h | 2 +- io_uring/rw.c | 2 +- 5 files changed, 40 insertions(+), 24 deletions(-) -- 2.39.1