With poll triggered retries, each event trigger will cause a task_work item to be added for processing. If the ring is setup with IORING_SETUP_DEFER_TASKRUN and a task is waiting on multiple events to complete, any task_work addition will wake the task for processing these items. This can cause more context switches than we would like, if the application is deliberately waiting on multiple items to increase efficiency. For example, if an application has receive multishot armed for sockets and wants to wait for N to complete within M usec of time, we should not be waking up and processing these items until we have all the events we asked for. By switching the poll trigger to lazy wake, we'll process them when they are all ready, in one swoop, rather than wake multiple times only to process one and then go back to sleep. At some point we probably want to look at just making the lazy wake the default, but for now, let's just selectively enable it where it makes sense. Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> --- diff --git a/io_uring/poll.c b/io_uring/poll.c index 4c360ba8793a..d38d05edb4fa 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -370,7 +370,7 @@ static void __io_poll_execute(struct io_kiocb *req, int mask) req->io_task_work.func = io_poll_task_func; trace_io_uring_task_add(req, mask); - io_req_task_work_add(req); + __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); } static inline void io_poll_execute(struct io_kiocb *req, int res) -- Jens Axboe