On 8/20/24 4:19 PM, Jens Axboe wrote: > On 8/20/24 4:14 PM, David Wei wrote: >> On 2024-08-20 15:13, Pavel Begunkov wrote: >>>>>>> On 8/20/24 2:08 PM, David Wei wrote: >>>>>> To rephase the question, why is the original code calling >>>>>> schedule_hrtimeout_range_clock() not needing to differentiate behaviour >>>>>> between defer taskrun and not? >>>>> >>>>> Because that part is the same, the task schedules out and goes to sleep. >>>>> That has always been the same regardless of how the ring is setup. Only >>>>> difference is that DEFER_TASKRUN doesn't add itself to ctx->wait, and >>>>> hence cannot be woken by a wake_up(ctx->wait). We have to wake the task >>>>> manually. >>>>> >>>> >>>> io_cqring_timer_wakeup() is the timer expired callback which calls >>>> wake_up_process() or io_cqring_wake() depending on DEFER_TASKRUN. >>>> >>>> The original code calling schedule_hrtimeout_range_clock() uses >>>> hrtimer_sleeper instead, which has a default timer expired callback set >>>> to hrtimer_wakeup(). >>>> >>>> hrtimer_wakeup() only calls wake_up_process(). >>>> >>>> My question is: why this asymmetry? Why does the new custom callback >>>> require io_cqring_wake()? >>> >>> That's what I'm saying, it doesn't need and doesn't really want it. >>> From the correctness point of view, it's ok since we wake up a >>> (unnecessarily) larger set of tasks. >>> >> >> Yeah your explanation that came in while I was writing the email >> answered it, thanks Pavel. > > Hah and now I see what you meant - yeah we can just remove the > distinction. I didn't see anything in testing, but I also didn't have > multiple tasks waiting on a ring, nor would you. So it doesn't really > matter, but I'll clean it up so there's no confusion. Actually probably better to just leave it as-is, as we'd otherwise need to store a task in io_wait_queue. Which we of course could, and would remove a branch in there... -- Jens Axboe