Hi, On 2023-07-24 09:48:58 -0600, Jens Axboe wrote: > On 7/24/23 9:35?AM, Phil Elwell wrote: > > Hi Andres, > > > > With this commit applied to the 6.1 and later kernels (others not > > tested) the iowait time ("wa" field in top) in an ARM64 build running > > on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core > > is permanently blocked on I/O. The change can be observed after > > installing mariadb-server (no configuration or use is required). After > > reverting just this commit, "wa" drops to zero again. > > There are a few other threads on this... > > > I can believe that this change hasn't negatively affected performance, > > but the result is misleading. I also think it's pushing the boundaries > > of what a back-port to stable should do. FWIW, I think this partially just mpstat reporting something quite bogus. It makes no sense to say that a cpu is 100% busy waiting for IO, when the one process is doing IO is just waiting. > +static bool current_pending_io(void) > +{ > + struct io_uring_task *tctx = current->io_uring; > + > + if (!tctx) > + return false; > + return percpu_counter_read_positive(&tctx->inflight); > +} > + > /* when returns >0, the caller should retry */ > static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, > struct io_wait_queue *iowq) > { > - int token, ret; > + int io_wait, ret; > > if (unlikely(READ_ONCE(ctx->check_cq))) > return 1; > @@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, > return 0; > > /* > - * Use io_schedule_prepare/finish, so cpufreq can take into account > - * that the task is waiting for IO - turns out to be important for low > - * QD IO. > + * Mark us as being in io_wait if we have pending requests, so cpufreq > + * can take into account that the task is waiting for IO - turns out > + * to be important for low QD IO. > */ > - token = io_schedule_prepare(); > + io_wait = current->in_iowait; I don't know the kernel "rules" around this, but ->in_iowait is only modified in kernel/sched, so it seemed a tad "unfriendly" to scribble on it here... Building a kernel to test with the patch applied, will reboot into it once the call I am on has finished. Unfortunately the performance difference didn't reproduce nicely in VM... Greetings, Andres Freund