On 3/18/25 12:39 AM, Pavel Begunkov wrote: > On 3/17/25 14:07, Jens Axboe wrote: >> On 3/16/25 12:57 AM, Pavel Begunkov wrote: >>> On 3/14/25 18:48, Jens Axboe wrote: >>>> By default, io_uring marks a waiting task as being in iowait, if it's >>>> sleeping waiting on events and there are pending requests. This isn't >>>> necessarily always useful, and may be confusing on non-storage setups >>>> where iowait isn't expected. It can also cause extra power usage, by >>> >>> I think this passage hints on controlling iowait stats, and in my opinion >>> we shouldn't conflate stats and optimisations. Global iowait stats >>> is there to stay, but ideally we want to never account io_uring as iowait. >>> That's while there were talks about removing optimisation toggle at all >>> (and do it as internal cpufreq magic, I suppose). >>> >>> How about posing it as an optimisation option only and that iowait stat >>> is a side effect that can change. Explicitly spelling that in the commit >>> message and in a comment on top of the flag in an attempt to avoid the >>> uapi regression trap. We'd also need it in the option's man when it's >>> written. And I'd also add "hint" to the flag name, like >>> IORING_ENTER_HINT_NO_IOWAIT, as we might need to nop it if anything >>> changes on the cpufreq side. >> >> Having potentially the control of both would be useful, the stat > > It's not the right place to control the stat accounting though, > apps don't care about iowait, it's usually monitored by a different > entity / person from outside the app, so responsibilities don't > match. It's fine if you fully control the stack, but just imagine Sometimes those are one and the same thing, though - there's just the one application running. That's not uncommon in data centers. > a bunch of apps using different frameworks with io_uring inside > that make different choices about it. The final iowait reading > would be just a mess. With this patch at least we can say it's > an unfortunate side effect. > If we can separately control the accounting, a sysctl knob would > probably be better, i.e. to be set globally from outside of an > app, but I don't think we care enough to add extra logic / overhead > for handling it. That's not a bad idea, maybe we just do that for starters? We can always introduce per-enter flags for managing boost and/or stats, at least it provides a system wide setting that can just get overridden by flags, should we need it. >> accounting and the cpufreq boosting. I do think the current name is >> better, though, the hint doesn't really add anything. I think we'd want > > "Hint" tells the user that it's legit for the kernel to ignore > it, including the iowait stat differences the user may see. And > we may actually need to drop the flag if task->iowait knob will > get hidden from io_uring in the future. The main benefit here > is for it to be in the name, because there are always those who > don't read comments. But that's the part I have a problem with - sometimes you'd need to know if it's honored or not. -- Jens Axboe