Re: FAILED: patch "[PATCH] io_uring: Use io_schedule* in cqring wait" failed to apply to 6.1-stable tree

Jens Axboe <axboe@xxxxxxxxx> · Mon, 17 Jul 2023 10:32:18 -0600

On 7/16/23 1:19?PM, Jens Axboe wrote:
> On 7/16/23 1:11?PM, Andres Freund wrote:
>> Hi,
>>
>> On 2023-07-16 12:13:45 -0600, Jens Axboe wrote:
>>> Here's one for 6.1-stable.
>>
>> Thanks for working on that!
>>
>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index cc35aba1e495..de117d3424b2 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -2346,7 +2346,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
>>>  					  struct io_wait_queue *iowq,
>>>  					  ktime_t *timeout)
>>>  {
>>> -	int ret;
>>> +	int token, ret;
>>>  	unsigned long check_cq;
>>>  
>>>  	/* make sure we run task_work before checking for signals */
>>> @@ -2362,9 +2362,18 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
>>>  		if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
>>>  			return -EBADR;
>>>  	}
>>> +
>>> +	/*
>>> +	 * Use io_schedule_prepare/finish, so cpufreq can take into account
>>> +	 * that the task is waiting for IO - turns out to be important for low
>>> +	 * QD IO.
>>> +	 */
>>> +	token = io_schedule_prepare();
>>> +	ret = 0;
>>>  	if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
>>> -		return -ETIME;
>>> -	return 1;
>>> +		ret = -ETIME;
>>> +	io_schedule_finish(token);
>>> +	return ret;
>>>  }
>>
>> To me it looks like this might have changed more than intended? Previously
>> io_cqring_wait_schedule() returned 0 in case schedule_hrtimeout() returned
>> non-zero, now io_cqring_wait_schedule() returns 1 in that case?  Am I missing
>> something?
> 
> Ah shoot yes indeed. Greg, can you drop the 5.10/5.15/6.1 ones for now?
> I'll get it sorted tomorrow. Sorry about that, and thanks for catching
> that Andres!

Greg, can you pick up these two for 5.10-stable and 5.15-stable? While
running testing, noticed another backport that was missing, so added
that as we..

-- 
Jens Axboe
From 4e214e7e01158a87308a17766706159bca472855 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@xxxxxxxxx>
Date: Mon, 17 Jul 2023 10:27:20 -0600
Subject: [PATCH 2/2] io_uring: add reschedule point to handle_tw_list()

Commit f58680085478dd292435727210122960d38e8014 upstream.

If CONFIG_PREEMPT_NONE is set and the task_work chains are long, we
could be running into issues blocking others for too long. Add a
reschedule check in handle_tw_list(), and flush the ctx if we need to
reschedule.

Cc: stable@xxxxxxxxxxxxxxx # 5.10+
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
 io_uring/io_uring.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 33d4a2871dbb..eae7a3d89397 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2216,9 +2216,12 @@ static void tctx_task_work(struct callback_head *cb)
 			}
 			req->io_task_work.func(req, &locked);
 			node = next;
+			if (unlikely(need_resched())) {
+				ctx_flush_and_put(ctx, &locked);
+				ctx = NULL;
+				cond_resched();
+			}
 		} while (node);
-
-		cond_resched();
 	}
 
 	ctx_flush_and_put(ctx, &locked);
-- 
2.40.1

From c8c88d523c89e0ac8affbf2fd57def82e0d5d4bf Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@xxxxxxxxxxx>
Date: Sun, 16 Jul 2023 12:07:03 -0600
Subject: [PATCH 1/2] io_uring: Use io_schedule* in cqring wait

Commit 8a796565cec3601071cbbd27d6304e202019d014 upstream.

I observed poor performance of io_uring compared to synchronous IO. That
turns out to be caused by deeper CPU idle states entered with io_uring,
due to io_uring using plain schedule(), whereas synchronous IO uses
io_schedule().

The losses due to this are substantial. On my cascade lake workstation,
t/io_uring from the fio repository e.g. yields regressions between 20%
and 40% with the following command:
./t/io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S$use_sync -R 0 /mnt/t2/fio/write.0.0

This is repeatable with different filesystems, using raw block devices
and using different block devices.

Use io_schedule_prepare() / io_schedule_finish() in
io_cqring_wait_schedule() to address the difference.

After that using io_uring is on par or surpassing synchronous IO (using
registered files etc makes it reliably win, but arguably is a less fair
comparison).

There are other calls to schedule() in io_uring/, but none immediately
jump out to be similarly situated, so I did not touch them. Similarly,
it's possible that mutex_lock_io() should be used, but it's not clear if
there are cases where that matters.

Cc: stable@xxxxxxxxxxxxxxx # 5.10+
Cc: Pavel Begunkov <asml.silence@xxxxxxxxx>
Cc: io-uring@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Signed-off-by: Andres Freund <andres@xxxxxxxxxxx>
Link: https://lore.kernel.org/r/20230707162007.194068-1-andres@xxxxxxxxxxx
[axboe: minor style fixup]
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
 io_uring/io_uring.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index e633799c9cea..33d4a2871dbb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -7785,7 +7785,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 					  struct io_wait_queue *iowq,
 					  ktime_t *timeout)
 {
-	int ret;
+	int token, ret;
 
 	/* make sure we run task_work before checking for signals */
 	ret = io_run_task_work_sig();
@@ -7795,9 +7795,17 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 	if (test_bit(0, &ctx->check_cq_overflow))
 		return 1;
 
+	/*
+	 * Use io_schedule_prepare/finish, so cpufreq can take into account
+	 * that the task is waiting for IO - turns out to be important for low
+	 * QD IO.
+	 */
+	token = io_schedule_prepare();
+	ret = 1;
 	if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
-		return -ETIME;
-	return 1;
+		ret = -ETIME;
+	io_schedule_finish(token);
+	return ret;
 }
 
 /*
-- 
2.40.1