On 8/20/24 4:51 PM, Jens Axboe wrote: > On 8/20/24 4:19 PM, Jens Axboe wrote: >> On 8/20/24 4:14 PM, David Wei wrote: >>> On 2024-08-20 15:13, Pavel Begunkov wrote: >>>>>>>> On 8/20/24 2:08 PM, David Wei wrote: >>>>>>> To rephase the question, why is the original code calling >>>>>>> schedule_hrtimeout_range_clock() not needing to differentiate behaviour >>>>>>> between defer taskrun and not? >>>>>> >>>>>> Because that part is the same, the task schedules out and goes to sleep. >>>>>> That has always been the same regardless of how the ring is setup. Only >>>>>> difference is that DEFER_TASKRUN doesn't add itself to ctx->wait, and >>>>>> hence cannot be woken by a wake_up(ctx->wait). We have to wake the task >>>>>> manually. >>>>>> >>>>> >>>>> io_cqring_timer_wakeup() is the timer expired callback which calls >>>>> wake_up_process() or io_cqring_wake() depending on DEFER_TASKRUN. >>>>> >>>>> The original code calling schedule_hrtimeout_range_clock() uses >>>>> hrtimer_sleeper instead, which has a default timer expired callback set >>>>> to hrtimer_wakeup(). >>>>> >>>>> hrtimer_wakeup() only calls wake_up_process(). >>>>> >>>>> My question is: why this asymmetry? Why does the new custom callback >>>>> require io_cqring_wake()? >>>> >>>> That's what I'm saying, it doesn't need and doesn't really want it. >>>> From the correctness point of view, it's ok since we wake up a >>>> (unnecessarily) larger set of tasks. >>>> >>> >>> Yeah your explanation that came in while I was writing the email >>> answered it, thanks Pavel. >> >> Hah and now I see what you meant - yeah we can just remove the >> distinction. I didn't see anything in testing, but I also didn't have >> multiple tasks waiting on a ring, nor would you. So it doesn't really >> matter, but I'll clean it up so there's no confusion. > > Actually probably better to just leave it as-is, as we'd otherwise need > to store a task in io_wait_queue. Which we of course could, and would > remove a branch in there... I guess I should actually look at the code first, we have it via wq->private already. Hence: diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ddfbe04c61ed..4ba5292137c3 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2353,13 +2353,9 @@ static bool current_pending_io(void) static enum hrtimer_restart io_cqring_timer_wakeup(struct hrtimer *timer) { struct io_wait_queue *iowq = container_of(timer, struct io_wait_queue, t); - struct io_ring_ctx *ctx = iowq->ctx; WRITE_ONCE(iowq->hit_timeout, 1); - if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) - wake_up_process(ctx->submitter_task); - else - io_cqring_wake(ctx); + wake_up_process(iowq->wq.private); return HRTIMER_NORESTART; } -- Jens Axboe