Re: [PATCH 11/11] io_uring: get rid of intermediate aux cqe caches

Pavel Begunkov <asml.silence@xxxxxxxxx> · Fri, 15 Mar 2024 17:26:43 +0000

On 3/15/24 16:49, Jens Axboe wrote:
On 3/15/24 10:44 AM, Pavel Begunkov wrote:
On 3/15/24 16:27, Jens Axboe wrote:
On 3/15/24 10:25 AM, Jens Axboe wrote:
On 3/15/24 10:23 AM, Pavel Begunkov wrote:
On 3/15/24 16:20, Jens Axboe wrote:
On 3/15/24 9:30 AM, Pavel Begunkov wrote:
io_post_aux_cqe(), which is used for multishot requests, delays
completions by putting CQEs into a temporary array for the purpose
completion lock/flush batching.

DEFER_TASKRUN doesn't need any locking, so for it we can put completions
directly into the CQ and defer post completion handling with a flag.
That leaves !DEFER_TASKRUN, which is not that interesting / hot for
multishot requests, so have conditional locking with deferred flush
for them.

This breaks the read-mshot test case, looking into what is going on
there.

I forgot to mention, yes it does, the test makes odd assumptions about
overflows, IIRC it expects that the kernel allows one and only one aux
CQE to be overflown. Let me double check

Yeah this is very possible, the overflow checking could be broken in
there. I'll poke at it and report back.

It does, this should fix it:

diff --git a/test/read-mshot.c b/test/read-mshot.c
index 8fcb79857bf0..501ca69a98dc 100644
--- a/test/read-mshot.c
+++ b/test/read-mshot.c
@@ -236,7 +236,7 @@ static int test(int first_good, int async, int overflow)
           }
           if (!(cqe->flags & IORING_CQE_F_MORE)) {
               /* we expect this on overflow */
-            if (overflow && (i - 1 == NR_OVERFLOW))
+            if (overflow && i >= NR_OVERFLOW)

Which is not ideal either, e.g. I wouldn't mind if the kernel stops
one entry before CQ is full, so that the request can complete w/o
overflowing. Not supposing the change because it's a marginal
case, but we shouldn't limit ourselves.

But if the event keeps triggering we have to keep posting CQEs,
otherwise we could get stuck. 

Or we can complete the request, then the user consumes CQEs
and restarts as usual

As far as I'm concerned, the behavior with
the patch looks correct. The last CQE is overflown, and that terminates
it, and it doesn't have MORE set. The one before that has MORE set, but
it has to, unless you aborted it early. But that seems impossible,
because what if that was indeed the last current CQE, and we reap CQEs
before the next one is posted.

So unless I'm missing something, I don't think we can be doing any
better.

You can opportunistically try to avoid overflows, unreliably

bool io_post_cqe() {
	// Not enough space in the CQ left, so if there is a next
	// completion pending we'd have to overflow. Avoid that by
	// terminating it now.
	//
	// If there are no more CQEs after this one, we might
	// terminate a bit earlier, but that better because
	// overflows are so expensive and unhandy and so on.
	if (cq_space_left() <= 1)
		return false;
	fill_cqe();
	return true;
}

some_multishot_function(req) {
	if (!io_post_cqe(res))
		complete_req(req, res);
}

Again, not suggesting the change for all the obvious reasons, but
I think semantically we should be able to do it.

--
Pavel Begunkov