io_uring_queue_exit is REALLY slow

Clay Harris <bugs@xxxxxxxxxxx> · Sat, 6 Jun 2020 22:55:55 -0500

So, I realize that this probably isn't something that you've looked
at yet.  But, I was interested in a different criteria looking at
io_uring.  That is how efficient it is for small numbers of requests
which don't transfer much data.  In other words, what is the minimum
amount of io_uring work for which a program speed-up can be obtained.
I realize that this is highly dependent on how much overlap can be
gained with async processing.

In order to get a baseline, I wrote a test program which performs
4 opens, followed by 4 read + closes.  For the baseline I
intentionally used files in /proc so that there would be minimum
async and I could set IOSQE_ASYNC later.  I was quite surprised
by the result:  Almost the entire program wall time was used in
the io_uring_queue_exit() call.

I wrote another test program which does just inits followed by exits.
There are clock_gettime()s around the io_uring_queue_init(8, &ring, 0)
and io_uring_queue_exit() calls and I printed the ratio of the
io_uring_queue_exit() elapsed time and the sum of elapsed time of
both calls.

The result varied between 0.94 and 0.99.  In other words, exit is
between 16 and 100 times slower than init.  Average ratio was
around 0.97.  Looking at the liburing code, exit does just what
I'd expect (unmap pages and close io_uring fd).

I would have bet the ratio would be less than 0.50.  No
operations were ever performed by the ring, so there should be
minimal cleanup.  Even if the kernel needed to do a bunch of
cleanup, it shouldn't need the pages mapped into user space to work;
same thing for the fd being open in the user process.

Seems like there is some room for optimization here.