Re: [PATCH 4/5] io_uring: add support for batch wait timeout

Jens Axboe <axboe@xxxxxxxxx> · Tue, 20 Aug 2024 15:36:06 -0600

On 8/20/24 3:10 PM, David Wei wrote:
>> +/*
>> + * Doing min_timeout portion. If we saw any timeouts, events, or have work,
>> + * wake up. If not, and we have a normal timeout, switch to that and keep
>> + * sleeping.
>> + */
>> +static enum hrtimer_restart io_cqring_min_timer_wakeup(struct hrtimer *timer)
>> +{
>> +	struct io_wait_queue *iowq = container_of(timer, struct io_wait_queue, t);
>> +	struct io_ring_ctx *ctx = iowq->ctx;
>> +
>> +	/* no general timeout, or shorter, we are done */
>> +	if (iowq->timeout == KTIME_MAX ||
>> +	    ktime_after(iowq->min_timeout, iowq->timeout))
>> +		goto out_wake;
>> +	/* work we may need to run, wake function will see if we need to wake */
>> +	if (io_has_work(ctx))
>> +		goto out_wake;
>> +	/* got events since we started waiting, min timeout is done */
>> +	if (iowq->cq_min_tail != READ_ONCE(ctx->rings->cq.tail))
>> +		goto out_wake;
>> +	/* if we have any events and min timeout expired, we're done */
>> +	if (io_cqring_events(ctx))
>> +		goto out_wake;
> 
> How can ctx->rings->cq.tail be modified if the task is sleeping while
> waiting for completions? What is doing the work?

Good question. If we have a min_timeout of <something> and a batch count
of <something>, ideally we don't want to wake the task to process when a
single completion comes in. And this is how we handle DEFER_TASKRUN, but
for anything else, the task will wake and process items. So it may have
woken up to process an item and posted a completion before this timeout
triggers. If that's the case, and min_timeout has expired (which it has
when this handler is called), then we should wake up and return.

-- 
Jens Axboe