On 2/24/24 8:31 AM, Pavel Begunkov wrote: > On 2/24/24 05:07, David Wei wrote: >> Currently we unconditionally account time spent waiting for events in CQ >> ring as iowait time. >> >> Some userspace tools consider iowait time to be CPU util/load which can >> be misleading as the process is sleeping. High iowait time might be >> indicative of issues for storage IO, but for network IO e.g. socket >> recv() we do not control when the completions happen so its value >> misleads userspace tooling. >> >> This patch gates the previously unconditional iowait accounting behind a >> new IORING_REGISTER opcode. By default time is not accounted as iowait, >> unless this is explicitly enabled for a ring. Thus userspace can decide, >> depending on the type of work it expects to do, whether it wants to >> consider cqring wait time as iowait or not. > > I don't believe it's a sane approach. I think we agree that per > cpu iowait is a silly and misleading metric. I have hard time to > define what it is, and I'm sure most probably people complaining > wouldn't be able to tell as well. Now we're taking that metric > and expose even more knobs to userspace. For sure, it's a stupid metric. But at the same time, educating people on this can be like talking to a brick wall, and it'll be years of doing that before we're making a dent in it. Hence I do think that just exposing the knob and letting the storage side use it, if they want, is the path of least resistance. I'm personally not going to do a crusade on iowait to eliminate it, I don't have the time for that. I'll educate people when it comes up, like I have been doing, but pulling this to conclusion would be 10+ years easily. > Another argument against is that per ctx is not the right place > to have it. It's a system metric, and you can imagine some system > admin looking for it. Even in cases when had some meaning w/o > io_uring now without looking at what flags io_uring has it's > completely meaningless, and it's too much to ask. > > I don't understand why people freak out at seeing hi iowait, > IMHO it perfectly fits the definition of io_uring waiting for > IO / completions, but at this point it might be better to just > revert it to the old behaviour of not reporting iowait at all. > And if we want to save the cpu freq iowait optimisation, we > should just split notion of iowait reporting and iowait cpufreq > tuning. For io_uring, splitting the cpufreq from iowait is certainly the right approach. And then just getting rid of iowait completely on the io_uring side. This can be done without preaching about iowait to everyone that has bad metrics for their healt monitoring, which is why I like that a lot. I did ponder that the other day as well. You still kind of run into a problem with that in terms of when short vs long waits are expected. On the io_uring side, we use the "do I have any requests pending" for that, which is obviously not fine grained enough. We could apply it on just "do I have any requests against regular files" instead, which would then translate to needing further tracking on our side. Probably fine to just apply it for the existing logic, imho. -- Jens Axboe