On 7/24/23 10:16?AM, Andres Freund wrote: > Hi, > > On 2023-07-24 09:48:58 -0600, Jens Axboe wrote: >> On 7/24/23 9:35?AM, Phil Elwell wrote: >>> Hi Andres, >>> >>> With this commit applied to the 6.1 and later kernels (others not >>> tested) the iowait time ("wa" field in top) in an ARM64 build running >>> on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core >>> is permanently blocked on I/O. The change can be observed after >>> installing mariadb-server (no configuration or use is required). After >>> reverting just this commit, "wa" drops to zero again. >> >> There are a few other threads on this... >> >>> I can believe that this change hasn't negatively affected performance, >>> but the result is misleading. I also think it's pushing the boundaries >>> of what a back-port to stable should do. > > FWIW, I think this partially just mpstat reporting something quite bogus. It > makes no sense to say that a cpu is 100% busy waiting for IO, when the one > process is doing IO is just waiting. Indeed... It really just means it's spending 100% of its time _waiting_ on IO, not that it's doing anything. This is largely to save myself from future emails on this subject, saving my own time. >> +static bool current_pending_io(void) >> +{ >> + struct io_uring_task *tctx = current->io_uring; >> + >> + if (!tctx) >> + return false; >> + return percpu_counter_read_positive(&tctx->inflight); >> +} >> + >> /* when returns >0, the caller should retry */ >> static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, >> struct io_wait_queue *iowq) >> { >> - int token, ret; >> + int io_wait, ret; >> >> if (unlikely(READ_ONCE(ctx->check_cq))) >> return 1; >> @@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, >> return 0; >> >> /* >> - * Use io_schedule_prepare/finish, so cpufreq can take into account >> - * that the task is waiting for IO - turns out to be important for low >> - * QD IO. >> + * Mark us as being in io_wait if we have pending requests, so cpufreq >> + * can take into account that the task is waiting for IO - turns out >> + * to be important for low QD IO. >> */ >> - token = io_schedule_prepare(); >> + io_wait = current->in_iowait; > > I don't know the kernel "rules" around this, but ->in_iowait is only > modified in kernel/sched, so it seemed a tad "unfriendly" to scribble > on it here... It's either that or add new helpers for this, at least for the initial one. Calling blk_flush_plug() (and with async == true, no less) is not something we need or want to do. So we could add an io_schedule_prepare_noflush() for this, but also seems silly to add a single use helper for that imho. > Building a kernel to test with the patch applied, will reboot into it > once the call I am on has finished. Unfortunately the performance > difference didn't reproduce nicely in VM... Thanks! -- Jens Axboe