Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL

Jens Axboe <axboe@xxxxxxxxx> · Tue, 19 Apr 2022 09:21:03 -0600

On 4/19/22 6:31 AM, Jens Axboe wrote:
> On 4/19/22 6:21 AM, Avi Kivity wrote:
>> On 19/04/2022 15.04, Jens Axboe wrote:
>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>
>>>>>>
>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>> inline completion vs workqueue completion:
>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>> completions.
>>>>
>>>> I measured this:
>>>>
>>>>
>>>>
>>>>   Performance counter stats for 'system wide':
>>>>
>>>>           1,273,756 io_uring:io_uring_task_add
>>>>
>>>>        12.288597765 seconds time elapsed
>>>>
>>>> Which exactly matches with the number of requests sent. If that's the
>>>> wrong counter to measure, I'm happy to try again with the correct
>>>> counter.
>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>> expected.

Might actually be implicated. Not because it's a async worker, but
because I think we might be losing some affinity in this case. Looking
at traces, we're definitely bouncing between the poll completion side
and then execution the completion.

Can you try this hack? It's against -git + for-5.19/io_uring. If you let
me know what base you prefer, I can do a version against that. I see
about a 3% win with io_uring with this, and was slower before against
linux-aio as you saw as well.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index caa5b673f8f5..f3da6c9a9635 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6303,6 +6303,25 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 		io_req_complete_failed(req, ret);
 }
 
+static bool __io_poll_execute_direct(struct io_kiocb *req, int mask, int events)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+
+	if (ctx->has_evfd || req->flags & REQ_F_INFLIGHT ||
+	    req->opcode != IORING_OP_POLL_ADD)
+		return false;
+	if (!spin_trylock(&ctx->completion_lock))
+		return false;
+
+	req->cqe.res = mangle_poll(mask & events);
+	hash_del(&req->hash_node);
+	__io_req_complete_post(req, req->cqe.res, 0);
+	io_commit_cqring(ctx);
+	spin_unlock(&ctx->completion_lock);
+	io_cqring_ev_posted(ctx);
+	return true;
+}
+
 static void __io_poll_execute(struct io_kiocb *req, int mask, int events)
 {
 	req->cqe.res = mask;
@@ -6384,7 +6403,8 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 			else
 				req->flags &= ~REQ_F_SINGLE_POLL;
 		}
-		__io_poll_execute(req, mask, poll->events);
+		if (!__io_poll_execute_direct(req, mask, poll->events))
+			__io_poll_execute(req, mask, poll->events);
 	}
 	return 1;
 }

-- 
Jens Axboe