On 2024-08-16 15:49, Jens Axboe wrote: > On 8/16/24 4:40 PM, David Wei wrote: >> diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2 >> index 5e4121b..c06050a 100644 >> --- a/man/io_uring_enter.2 >> +++ b/man/io_uring_enter.2 >> @@ -104,6 +104,11 @@ If the ring file descriptor has been registered through use of >> then setting this flag will tell the kernel that the >> .I ring_fd >> passed in is the registered ring offset rather than a normal file descriptor. >> +.TP >> +.B IORING_ENTER_NO_IOWAIT >> +If this flag is set, then in_iowait will not be set for the current task if >> +.BR io_uring_enter (2) >> +results in waiting. > > I'd probably would say something ala: > > If this flag is set, then waiting on events will not be accounted as > iowait for the task if > .BR io_uring_enter (2) > results in waiting. > > or something like that. io_iowait is an in-kernel thing, iowait is what > application writers will know about. > >> +.B #include <liburing.h> >> +.PP >> +.BI "int io_uring_iowait_toggle(struct io_uring *" ring ", >> +.BI " bool " enabled ");" >> +.BI " >> +.fi >> +.SH DESCRIPTION >> +.PP >> +The >> +.BR io_uring_iowait_toggle (3) >> +function toggles for a given >> +.I ring >> +whether in_iowait is set for the current task while waiting for completions. >> +When in_iowait is set, time spent waiting is accounted as iowait time; >> +otherwise, it is accounted as idle time. The default behavior is to always set >> +in_iowait to true. > > And ditto here > >> +Setting in_iowait achieves two things: >> + >> +1. Account time spent waiting as iowait time >> + >> +2. Enable cpufreq optimisations, setting SCHED_CPUFREQ_IOWAIT on the rq > > This should probably be something ala: > > Setting in_iowait achieves two things: > .TP > .B Account time spent waiting as iowait time > .TP > .B Enable cpufreq optimisations, setting SCHED_CPUFREQ_IOWAIT on the rq > .PP > > to make that format like a man page. > >> +Some user tooling attributes iowait time as CPU utilization time, so high >> +iowait time can look like apparent high CPU utilization, even though the task >> +is not scheduled and the CPU is free to run other tasks. This function >> +provides a way to disable this behavior where it makes sense to do so. > > And here. Since this is the main man page, maybe also add something > about how iowait is a relic from the old days of only having one CPU, > and it indicates that the task is block uninterruptibly waiting for IO > and hence cannot do other things. These days it's mostly a bogus > accounting value, but it does help with the cpufreq boost for certain > high frequency waits. Rephrase as needed :-) > >> diff --git a/src/queue.c b/src/queue.c >> index c436061..bd2e6af 100644 >> --- a/src/queue.c >> +++ b/src/queue.c >> @@ -110,6 +110,8 @@ static int _io_uring_get_cqe(struct io_uring *ring, >> >> if (ring->int_flags & INT_FLAG_REG_RING) >> flags |= IORING_ENTER_REGISTERED_RING; >> + if (ring->int_flags & INT_FLAG_NO_IOWAIT) >> + flags |= IORING_ENTER_NO_IOWAIT; >> ret = __sys_io_uring_enter2(ring->enter_ring_fd, data->submit, >> data->wait_nr, flags, data->arg, >> data->sz); > > Not strictly related, and we can always do that after, but now we have > two branches here. Since the INT flags are purely internal, we can > renumber them so that INT_FLAG_REG_RING matches > IORING_ENTER_REGISTERED_RING and INT_FLAG_NO_IOWAIT matches > IORING_ENTER_NO_IOWAIT. With that, you could kill the above two branches > and simply do: > > flags |= (ring->int_flags & INT_TO_ENTER_MASK); > > which I think would be a nice thing to do. SG, I'll do this in a follow up. > > Rest looks good, no further comments. >