Check for IORING_ENTER_NO_WAIT and do not set current->in_iowait if it is set. To maintain existing behaviour, by default this flag is not set. This is to prevent waiting for completions being accounted as iowait time. Some userspace tools consider iowait time to be 'utilisation' time which is misleading since the task is not scheduled and the CPU is free to run other tasks. High iowait time might be indicative of issues for block IO, but not for network IO i.e. recv() where we do not control when IO happens. Signed-off-by: David Wei <dw@xxxxxxxxxxx> --- io_uring/io_uring.c | 4 +++- io_uring/io_uring.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4cc905b228a5..9438875e43ea 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2372,7 +2372,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, * can take into account that the task is waiting for IO - turns out * to be important for low QD IO. */ - if (current_pending_io()) + if (!iowq->no_iowait && current_pending_io()) current->in_iowait = 1; ret = 0; if (iowq->timeout == KTIME_MAX) @@ -2414,6 +2414,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags, iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts); iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events; iowq.timeout = KTIME_MAX; + if (flags & IORING_ENTER_NO_IOWAIT) + iowq.no_iowait = true; if (uts) { struct timespec64 ts; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 9935819f12b7..e35fecca4445 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -41,6 +41,7 @@ struct io_wait_queue { unsigned cq_tail; unsigned nr_timeouts; ktime_t timeout; + bool no_iowait; #ifdef CONFIG_NET_RX_BUSY_POLL ktime_t napi_busy_poll_dt; -- 2.43.5