On 8/17/24 21:20, Jens Axboe wrote:
On 8/17/24 1:44 PM, Pavel Begunkov wrote:
This patchset adds a IOURING_ENTER_NO_IOWAIT flag that can be set on
enter. If set, then current->in_iowait is not set. By default this flag
A worthwhile change but for _completely_ different reasons. So, first,
it's v3, not v2, considering the patchset from a couple month ago. And
since in essence nothing has changed, I can only repeat same points I
made back then.
The description reads like the flag's purpose is to change accounting,
and I'm vividly oppose any user exposed (per ring) toggle doing that.
We don't want the overhead, it's a very confusing feature, and not even
Come on, there's no overhead associated with this, in practice.
Minor, right, ~ 1 "if" or ccmov, but that's for a feature that nobody
really cares about and arguably even harmful. That's assuming there
will be another flag for performance tuning in the future, at least
the description reads like that.
It's really simple for this stuff - the freq boost is useful (and
needed) for some workloads, and the iowait accounting is never useful
for anything but (currently) comes as an unfortunate side effect of the
former. But even with those two separated, there are still going to be
cases where you want to control when it happens.
You can imagine such cases, but in reality I doubt it. If we
disable the stat part, nobody would notice as nobody cared for
last 3-4 years before in_iowait was added.
that helpful. iowait is monitored not by the app itself but by someone
else outside, likely by a different person, and even before trying to
make sense of numbers the monitor would need to learn first whether
_any_ program uses io_uring and what flags the application writer
decided to pass, even more fun when io_uring is used via a 3rd party
library.
Exactly same patches could make sense if you flip the description
and say "in_iowait is good for perfomance in some cases but
degrades power consumption for others, so here is a way to tune
performance", (just take Jamal's numbers). And that would need to
clearly state (including man) that the iowait statistic is a side
effect of it, we don't give it a thought, and the time accounting
aspect may and hopefully will change in the future.
Don't disagree with that, the boosting is the primary function here,
iowait accounting is just an odd relic and side effect.
Jens, can you remind what happened with separating iowait stats
vs the optimisation? I believed you sent some patches
Yep I still have them, and I'll dust them off and resend them. But it
doesn't really change the need for this at all, other than perhaps
wanting to rename the actual flag as it would not be about iowait at
all, it'd just be about power consumption.
Last cut was here:
https://git.kernel.dk/cgit/linux/log/?h=iowait.2
just need simple rebasing.
IMHO we should try to merge it first. And if there would be
some friction, we can get back and take this patch, with
additional note describing about iowait stat side effects.
is not set to maintain existing behaviour i.e. in_iowait is always set.
This is to prevent waiting for completions being accounted as CPU
utilisation.
For accounting, it's more reasonable to keep it disabled by
default, so we stop getting millions complaints per day about
high iowait.
Agree, it should just go away.
Not setting in_iowait does mean that we also lose cpufreq optimisations
above because in_iowait semantics couples 1 and 2 together. Eventually
we will untangle the two so the optimisations can be enabled
independently of the accounting.
IORING_FEAT_IOWAIT_TOGGLE is returned in io_uring_create() to indicate
support. This will be used by liburing to check for this feature.
Signed-off-by: David Wei <dw@xxxxxxxxxxx>
---
v2:
- squash patches into one
- move no_iowait in struct io_wait_queue to the end
- always set iowq.no_iowait
---
include/uapi/linux/io_uring.h | 2 ++
io_uring/io_uring.c | 7 ++++---
io_uring/io_uring.h | 1 +
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 48c440edf674..3a94afa8665e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -508,6 +508,7 @@ struct io_cqring_offsets {
#define IORING_ENTER_EXT_ARG (1U << 3)
#define IORING_ENTER_REGISTERED_RING (1U << 4)
#define IORING_ENTER_ABS_TIMER (1U << 5)
+#define IORING_ENTER_NO_IOWAIT (1U << 6)
Just curious, why did we switch from a register opcode to an
ENTER flag?
That was my suggestion, I think it's more flexible in that you can
disable the boost if you know you're doing longer waits or slower IO,
and enable it when frequencies go up. I don't particularly like the
static register approach for this.
Makes sense, it's easier for the app to count how many
request of what type is inflight.
The name might also be confusing. We need an explanation when
it could be useful, and name it accordingly. DEEP/SHALLOW_WAIT?
Do you remember how cpufreq accounts for it?
--
Pavel Begunkov