On 8/20/24 2:08 PM, David Wei wrote: > On 2024-08-19 16:28, Jens Axboe wrote: >> In preparation for having two distinct timeouts and avoid waking the >> task if we don't need to. >> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> --- >> io_uring/io_uring.c | 41 ++++++++++++++++++++++++++++++++++++----- >> io_uring/io_uring.h | 2 ++ >> 2 files changed, 38 insertions(+), 5 deletions(-) >> >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >> index 9e2b8d4c05db..ddfbe04c61ed 100644 >> --- a/io_uring/io_uring.c >> +++ b/io_uring/io_uring.c >> @@ -2322,7 +2322,7 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode, >> * Cannot safely flush overflowed CQEs from here, ensure we wake up >> * the task, and the next invocation will do it. >> */ >> - if (io_should_wake(iowq) || io_has_work(iowq->ctx)) >> + if (io_should_wake(iowq) || io_has_work(iowq->ctx) || iowq->hit_timeout) > > iowq->hit_timeout may be modified in a timer softirq context, while this > wait_queue_func_t (AIUI) may get called from any context e.g. > net_rx_softirq for sockets. Does this need a READ_ONLY()? Yes probably not a bad idea to make it READ_ONCE(). >> return autoremove_wake_function(curr, mode, wake_flags, key); >> return -1; >> } >> @@ -2350,6 +2350,38 @@ static bool current_pending_io(void) >> return percpu_counter_read_positive(&tctx->inflight); >> } >> >> +static enum hrtimer_restart io_cqring_timer_wakeup(struct hrtimer *timer) >> +{ >> + struct io_wait_queue *iowq = container_of(timer, struct io_wait_queue, t); >> + struct io_ring_ctx *ctx = iowq->ctx; >> + >> + WRITE_ONCE(iowq->hit_timeout, 1); >> + if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) >> + wake_up_process(ctx->submitter_task); >> + else >> + io_cqring_wake(ctx); > > This is a bit different to schedule_hrtimeout_range_clock(). Why is > io_cqring_wake() needed here for non-DEFER_TASKRUN? That's how the wakeups work - for defer taskrun, the task isn't on a waitqueue at all. Hence we need to wake the task itself. For any other setup, they will be on the waitqueue, and we just call io_cqring_wake() to wake up anyone waiting on the waitqueue. That will iterate the wake queue and call handlers for each item. Having a separate handler for that will allow to NOT wake up the task if we don't need to. taskrun, the waker -- Jens Axboe