Re: [PATCH] io_uring: be smarter about waking multiple CQ ring waiters

Pavel Begunkov <asml.silence@xxxxxxxxx> · Tue, 10 Aug 2021 03:08:21 +0100

On 8/10/21 2:55 AM, Jens Axboe wrote:
> On 8/9/21 7:42 PM, Pavel Begunkov wrote:
>> On 8/6/21 9:19 PM, Jens Axboe wrote:
>>> Currently we only wake the first waiter, even if we have enough entries
>>> posted to satisfy multiple waiters. Improve that situation so that
>>> every waiter knows how much the CQ tail has to advance before they can
>>> be safely woken up.
>>>
>>> With this change, if we have N waiters each asking for 1 event and we get
>>> 4 completions, then we wake up 4 waiters. If we have N waiters asking
>>> for 2 completions and we get 4 completions, then we wake up the first
>>> two. Previously, only the first waiter would've been woken up.
>>>
>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>>>
>>> ---
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index bf548af0426c..04df4fa3c75e 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1435,11 +1435,13 @@ static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx)
>>>  
>>>  static void io_cqring_ev_posted(struct io_ring_ctx *ctx)
>>>  {
>>> -	/* see waitqueue_active() comment */
>>> -	smp_mb();
>>> -
>>> -	if (waitqueue_active(&ctx->cq_wait))
>>> -		wake_up(&ctx->cq_wait);
>>> +	/*
>>> +	 * wake_up_all() may seem excessive, but io_wake_function() and
>>> +	 * io_should_wake() handle the termination of the loop and only
>>> +	 * wake as many waiters as we need to.
>>> +	 */
>>> +	if (wq_has_sleeper(&ctx->cq_wait))
>>> +		wake_up_all(&ctx->cq_wait);
>>>  	if (ctx->sq_data && waitqueue_active(&ctx->sq_data->wait))
>>>  		wake_up(&ctx->sq_data->wait);
>>>  	if (io_should_trigger_evfd(ctx))
>>> @@ -6968,20 +6970,21 @@ static int io_sq_thread(void *data)
>>>  struct io_wait_queue {
>>>  	struct wait_queue_entry wq;
>>>  	struct io_ring_ctx *ctx;
>>> -	unsigned to_wait;
>>> +	unsigned cq_tail;
>>>  	unsigned nr_timeouts;
>>>  };
>>>  
>>>  static inline bool io_should_wake(struct io_wait_queue *iowq)
>>>  {
>>>  	struct io_ring_ctx *ctx = iowq->ctx;
>>> +	unsigned tail = ctx->cached_cq_tail + atomic_read(&ctx->cq_timeouts);
>>
>> Seems, adding cq_timeouts can be dropped from here and iowq.cq_tail
> 
> Good point, we can drop it at both ends.
> 
>>>  	/*
>>>  	 * Wake up if we have enough events, or if a timeout occurred since we
>>>  	 * started waiting. For timeouts, we always want to return to userspace,
>>>  	 * regardless of event count.
>>>  	 */
>>> -	return io_cqring_events(ctx) >= iowq->to_wait ||
>>
>> Don't we miss smp_rmb() previously provided my io_cqring_events()?
> 
> For? We aren't reading any user modified pats.

I was rather thinking about who provides the barrier for userspace,
but that should be indeed on the userspace, and the function is
called from arbitrary CPU/context anyway.

>>
>>> +	return tail >= iowq->cq_tail ||
>>
>> tails might overflow
> 
> Indeed, I actually did fix this one before committing it.

Great

-- 
Pavel Begunkov