Re: [PATCH 6.4 800/800] io_uring: Use io_schedule* in cqring wait

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/24/23 11:47, Jens Axboe wrote:

It's not a behavioural change, it's a reporting change. There's no
functionality changing here. That said, I do think we should narrow it a
bit so we're only marked as in iowait if the task waiting has pending
IO. That should still satisfy the initial problem, and it won't flag
iowait on mariadb like cases where they have someone else just
perpetually waiting on requests.

As a side effect, this also removes the flush that wasn't at all
necessary on io_uring.

If folks are able to test the below, that would be appreciated.


diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 89a611541bc4..f4591b912ea8 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2493,11 +2493,20 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx)
  	return 0;
  }
+static bool current_pending_io(void)
+{
+	struct io_uring_task *tctx = current->io_uring;
+
+	if (!tctx)
+		return false;
+	return percpu_counter_read_positive(&tctx->inflight);
+}
+
  /* when returns >0, the caller should retry */
  static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
  					  struct io_wait_queue *iowq)
  {
-	int token, ret;
+	int io_wait, ret;
if (unlikely(READ_ONCE(ctx->check_cq)))
  		return 1;
@@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
  		return 0;
/*
-	 * Use io_schedule_prepare/finish, so cpufreq can take into account
-	 * that the task is waiting for IO - turns out to be important for low
-	 * QD IO.
+	 * Mark us as being in io_wait if we have pending requests, so cpufreq
+	 * can take into account that the task is waiting for IO - turns out
+	 * to be important for low QD IO.
  	 */
-	token = io_schedule_prepare();
+	io_wait = current->in_iowait;
+	if (current_pending_io())
+		current->in_iowait = 1;
  	ret = 0;
  	if (iowq->timeout == KTIME_MAX)
  		schedule();
  	else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS))
  		ret = -ETIME;
-	io_schedule_finish(token);
+	current->in_iowait = io_wait;
  	return ret;
  }
Tested on top of 6.4.6 stable - and working great - iowait stats now look like they always did.

thank you

gene



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux