On 8/20/24 3:10 PM, David Wei wrote: >> +/* >> + * Doing min_timeout portion. If we saw any timeouts, events, or have work, >> + * wake up. If not, and we have a normal timeout, switch to that and keep >> + * sleeping. >> + */ >> +static enum hrtimer_restart io_cqring_min_timer_wakeup(struct hrtimer *timer) >> +{ >> + struct io_wait_queue *iowq = container_of(timer, struct io_wait_queue, t); >> + struct io_ring_ctx *ctx = iowq->ctx; >> + >> + /* no general timeout, or shorter, we are done */ >> + if (iowq->timeout == KTIME_MAX || >> + ktime_after(iowq->min_timeout, iowq->timeout)) >> + goto out_wake; >> + /* work we may need to run, wake function will see if we need to wake */ >> + if (io_has_work(ctx)) >> + goto out_wake; >> + /* got events since we started waiting, min timeout is done */ >> + if (iowq->cq_min_tail != READ_ONCE(ctx->rings->cq.tail)) >> + goto out_wake; >> + /* if we have any events and min timeout expired, we're done */ >> + if (io_cqring_events(ctx)) >> + goto out_wake; > > How can ctx->rings->cq.tail be modified if the task is sleeping while > waiting for completions? What is doing the work? Good question. If we have a min_timeout of <something> and a batch count of <something>, ideally we don't want to wake the task to process when a single completion comes in. And this is how we handle DEFER_TASKRUN, but for anything else, the task will wake and process items. So it may have woken up to process an item and posted a completion before this timeout triggers. If that's the case, and min_timeout has expired (which it has when this handler is called), then we should wake up and return. -- Jens Axboe