Re: [PATCH 3/3] io_uring: add support for backlogged CQ ring

Pavel Begunkov <asml.silence@xxxxxxxxx> · Sat, 9 Nov 2019 15:33:18 +0300



On 11/9/2019 3:25 PM, Pavel Begunkov wrote:
> On 11/7/2019 7:00 PM, Jens Axboe wrote:
>> Currently we drop completion events, if the CQ ring is full. That's fine
>> for requests with bounded completion times, but it may make it harder to
>> use io_uring with networked IO where request completion times are
>> generally unbounded. Or with POLL, for example, which is also unbounded.
>>
>> This patch adds IORING_SETUP_CQ_NODROP, which changes the behavior a bit
>> for CQ ring overflows. First of all, it doesn't overflow the ring, it
>> simply stores a backlog of completions that we weren't able to put into
>> the CQ ring. To prevent the backlog from growing indefinitely, if the
>> backlog is non-empty, we apply back pressure on IO submissions. Any
>> attempt to submit new IO with a non-empty backlog will get an -EBUSY
>> return from the kernel. This is a signal to the application that it has
>> backlogged CQ events, and that it must reap those before being allowed
>> to submit more IO.>
>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>> ---
>>  fs/io_uring.c                 | 103 ++++++++++++++++++++++++++++------
>>  include/uapi/linux/io_uring.h |   1 +
>>  2 files changed, 87 insertions(+), 17 deletions(-)
>>
>> +static void io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
>> +{
>> +	struct io_rings *rings = ctx->rings;
>> +	struct io_uring_cqe *cqe;
>> +	struct io_kiocb *req;
>> +	unsigned long flags;
>> +	LIST_HEAD(list);
>> +
>> +	if (list_empty_careful(&ctx->cq_overflow_list))
>> +		return;
>> +	if (ctx->cached_cq_tail - READ_ONCE(rings->cq.head) ==
>> +	    rings->cq_ring_entries)
>> +		return;
>> +
>> +	spin_lock_irqsave(&ctx->completion_lock, flags);
>> +
>> +	while (!list_empty(&ctx->cq_overflow_list)) {
>> +		cqe = io_get_cqring(ctx);
>> +		if (!cqe && !force)
>> +			break;> +
>> +		req = list_first_entry(&ctx->cq_overflow_list, struct io_kiocb,
>> +						list);
>> +		list_move(&req->list, &list);
>> +		if (cqe) {
>> +			WRITE_ONCE(cqe->user_data, req->user_data);
>> +			WRITE_ONCE(cqe->res, req->result);
>> +			WRITE_ONCE(cqe->flags, 0);
>> +		}
> 
> Hmm, second thought. We should account overflow here.
> 
Clarification: We should account overflow in case of (!cqe).

i.e.
if (!cqe) { // else
	WRITE_ONCE(ctx->rings->cq_overflow,
			atomic_inc_return(&ctx->cached_cq_overflow));
}

>> +	}
>> +
>> +	io_commit_cqring(ctx);
>> +	spin_unlock_irqrestore(&ctx->completion_lock, flags);
>> +	io_cqring_ev_posted(ctx);
>> +
>> +	while (!list_empty(&list)) {
>> +		req = list_first_entry(&list, struct io_kiocb, list);
>> +		list_del(&req->list);
>> +		io_put_req(req, NULL);
>> +	}
>> +}
>> +

-- 
Pavel Begunkov