Re: [PATCH v1 1/4] io_uring: only account cqring wait time as iowait if enabled for a ring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/24/24 18:55, Jens Axboe wrote:
On 2/24/24 10:20 AM, David Wei wrote:
I don't believe it's a sane approach. I think we agree that per
cpu iowait is a silly and misleading metric. I have hard time to
define what it is, and I'm sure most probably people complaining
wouldn't be able to tell as well. Now we're taking that metric
and expose even more knobs to userspace.

Another argument against is that per ctx is not the right place
to have it. It's a system metric, and you can imagine some system
admin looking for it. Even in cases when had some meaning w/o
io_uring now without looking at what flags io_uring has it's
completely meaningless, and it's too much to ask.>
I don't understand why people freak out at seeing hi iowait,
IMHO it perfectly fits the definition of io_uring waiting for
IO / completions, but at this point it might be better to just
revert it to the old behaviour of not reporting iowait at all.

Irrespective of how misleading iowait is, many tools include it in its
CPU util/load calculations and users then use those metrics for e.g.
load balancing. In situations with storage workloads, iowait can be
useful even if its usefulness is limited. The problem that this patch is
trying to resolve is in mixed storage/network workloads on the same
system, where iowait has some usefulness (due to storage workloads)
_but_ I don't want network workloads contributing to the metric.

This does put the onus on userspace to do the right thing - decide
whether iowait makes sense for a workload or not. I don't have enough
kernel experience to know whether this expectation is realistic or not.
But, it is turned off by default so if userspace does not set it (which
seems like the most likely thing) then iowait accounting is off just
like the old behaviour. Perhaps we need to make it clearer to storage
use-cases to turn it on in order to get the optimisation?

Personally I don't care too much about per-ctx iowait, I don't think
it's an issue at all. Fact is, most workloads that do storage and
networking would likely use a ring for each. And if they do mix, then
just pick if you care about iowait or not. Long term, would the toggle

Let's say you want the optimisation, but don't want to screw up system
iowait stats because as we've seen there will be people complaining.
What do you do? 99% of frameworks and libraries would never enable it,
which is a shame. If it's some container hosting, the vendors might
start complaining, especially since it's inconsistent and depends on
the user, then we might need to blacklist it globally. Then the cases
when you control the entire stack, you need to tell people from other
teams and PEs that it's how it is.

iowait thing most likely just go away? Yep it would. But it's not like

I predict that with enough time if you'd try to root it out
there will be someone complaining that iowait is unexpectedly
0, it's a regression please return the flag back. I doubt it
would just go away, but probably depends on the timeline.

it's any kind of maintenance burden. tldr - if we can do cpufreq

ala death from thousand useless apis nobody cares about
nor truly understands.

boosting easily on waits without adding iowait to the mix, then that'd
be great and we can just do that. If not, let's add the iowait toggle
and just be done with it.

And if we want to save the cpu freq iowait optimisation, we
should just split notion of iowait reporting and iowait cpufreq
tuning.

Yeah, that could be an option. I'll take a look at it.

It'd be trivial to do, only issue I see is that it'd require another set
of per-runqueue atomics to count for short waits on top of the
nr_iowaits we already do. I doubt the scheduling side will be receptive
to that.

Looked it up, sounds unfortunate, but also seems like the
status quo can be optimised with an additional cpu local
var.

--
Pavel Begunkov




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux