On Sun, Jun 07 2020 at 08:37:30 -0600, Jens Axboe quoth thus: > On 6/6/20 9:55 PM, Clay Harris wrote: > > So, I realize that this probably isn't something that you've looked > > at yet. But, I was interested in a different criteria looking at > > io_uring. That is how efficient it is for small numbers of requests > > which don't transfer much data. In other words, what is the minimum > > amount of io_uring work for which a program speed-up can be obtained. > > I realize that this is highly dependent on how much overlap can be > > gained with async processing. > > > > In order to get a baseline, I wrote a test program which performs > > 4 opens, followed by 4 read + closes. For the baseline I > > intentionally used files in /proc so that there would be minimum > > async and I could set IOSQE_ASYNC later. I was quite surprised > > by the result: Almost the entire program wall time was used in > > the io_uring_queue_exit() call. > > > > I wrote another test program which does just inits followed by exits. > > There are clock_gettime()s around the io_uring_queue_init(8, &ring, 0) > > and io_uring_queue_exit() calls and I printed the ratio of the > > io_uring_queue_exit() elapsed time and the sum of elapsed time of > > both calls. > > > > The result varied between 0.94 and 0.99. In other words, exit is > > between 16 and 100 times slower than init. Average ratio was > > around 0.97. Looking at the liburing code, exit does just what > > I'd expect (unmap pages and close io_uring fd). > > > > I would have bet the ratio would be less than 0.50. No > > operations were ever performed by the ring, so there should be > > minimal cleanup. Even if the kernel needed to do a bunch of > > cleanup, it shouldn't need the pages mapped into user space to work; > > same thing for the fd being open in the user process. > > > > Seems like there is some room for optimization here. > > Can you share your test case? And what kernel are you using, that's > kind of important. > > There's no reason for teardown to be slow, except if you have > pending IO that we need to either cancel or wait for. Due to > other reasons, newer kernels will have most/some parts of > the teardown done out-of-line. I'm working up a test program for you. Just FYI: My initial analysis indicates that closing the io_uring fd is what's taking all the extra time. > -- > Jens Axboe