On 10/19/23 14:01, Jens Axboe wrote:
With poll triggered retries, each event trigger will cause a task_work
item to be added for processing. If the ring is setup with
IORING_SETUP_DEFER_TASKRUN and a task is waiting on multiple events to
complete, any task_work addition will wake the task for processing these
items. This can cause more context switches than we would like, if the
application is deliberately waiting on multiple items to increase
efficiency.
I'm a bit late here. The reason why I didn't enable it for polling is
because it changes the behaviour. Let's think of a situation where we
want to accept 2 sockets, so we send a multishot accept and do
cq_wait(nr=2). It was perfectly fine before, but now it'll hung as
there's only 1 request and so 1 tw queued. And same would happen with
multishot recv even though it's more relevant to packet based protocols
like UDP.
It might be not specific to multishots:
listen(backlog=1), queue N oneshot accepts and cq_wait(N).
Now we get the first connection in the queue to accept.
[IORING_OP_ACCEPT] = {
.poll_exclusive = 1,
}
Due to poll_exclusive (I assume) it wakes only one accept. That
will try to queue up a tw for it, but it'll not be executed
because it's just one item. No other connection can be queued
up because of the backlog limit => presumably no other request
will be woken up => that first tw never runs. It's more subtle
and timing specific than the previous example, but nevertheless
it's concerning we might step on sth like that.
For example, if an application has receive multishot armed for sockets
and wants to wait for N to complete within M usec of time, we should not
be waking up and processing these items until we have all the events we
asked for. By switching the poll trigger to lazy wake, we'll process
them when they are all ready, in one swoop, rather than wake multiple
times only to process one and then go back to sleep.
At some point we probably want to look at just making the lazy wake
the default, but for now, let's just selectively enable it where it
makes sense.
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 4c360ba8793a..d38d05edb4fa 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -370,7 +370,7 @@ static void __io_poll_execute(struct io_kiocb *req, int mask)
req->io_task_work.func = io_poll_task_func;
trace_io_uring_task_add(req, mask);
- io_req_task_work_add(req);
+ __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE);
}
static inline void io_poll_execute(struct io_kiocb *req, int res)
--
Pavel Begunkov