On 8/16/24 4:36 PM, David Wei wrote: > io_uring sets current->in_iowait when waiting for completions, which > achieves two things: > > 1. Proper accounting of the time as iowait time > 2. Enable cpufreq optimisations, setting SCHED_CPUFREQ_IOWAIT on the rq > > For block IO this makes sense as high iowait can be indicative of > issues. But for network IO especially recv, the recv side does not > control when the completions happen. > > Some user tooling attributes iowait time as CPU utilisation i.e. not > idle, so high iowait time looks like high CPU util even though the task > is not scheduled and the CPU is free to run other tasks. When doing > network IO with e.g. the batch completion feature, the CPU may appear to > have high utilisation. > > This patchset adds a IOURING_ENTER_NO_IOWAIT flag that can be set on > enter. If set, then current->in_iowait is not set. By default this flag > is not set to maintain existing behaviour i.e. in_iowait is always set. > This is to prevent waiting for completions being accounted as CPU > utilisation. > > Not setting in_iowait does mean that we also lose cpufreq optimisations > above because in_iowait semantics couples 1 and 2 together. Eventually > we will untangle the two so the optimisations can be enabled > independently of the accounting. > > IORING_FEAT_IOWAIT_TOGGLE is returned in io_uring_create() to indicate > support. This will be used by liburing to check for this feature. LGTM -- Jens Axboe