On Sun, 2022-02-20 at 12:38 -0700, Jens Axboe wrote: > > OK, that's a pretty good improvement in both latency and > deviation/consistency. Is this using SQPOLL, or is it using polling > off > cqring_wait from the task itself? Also something to consider for the > test benchmark app, should be able to run both (which is usually just > setting the SETUP_SQPOLL flag or not, if done right). > > The answer to your question is complex. This is one of the external factor that I was refering too. 1 thread is managing 49 TCP sockets. This thread io_uring context is configured with SQPOLL. Upon receiving a packet of interest, it will wake up thread #2 with an eventfd installed into a private non SQPOLL io_uring context and will send a request to a 50th TCP socket. Both threads are now busy polling NAPI. One from the SQPOLL code and the other with the io_cqring_wait() code. If it was not enough, since I have discovered busy poll benefits and that to reschedule a sleeping task takes about 5-10 uSecs, thread #1 is also busy polling io_uring instead of blocking in io_uring_enter(). Thx for suggesting designing the benchmark to be able to test both SQPOLL and non SQPOLL busy polling. This is something that I already in mind. I have completed 3 small improvements for the patch v2. I need to check the kernel test bot and Hao comments to see if I have more to work on but if all is good, I only need to complete the benchmark program. I might able to send v2 later today. Greetings,