The sqpoll thread can be used for performing the napi busy poll in a similar way that it does io polling for file systems supporting direct access bypassing the page cache. The other way that io_uring can be used for napi busy poll is by calling io_uring_enter() to get events. If the user specify a timeout value, it is distributed between polling and sleeping by using the systemwide setting /proc/sys/net/core/busy_poll. The changes have been tested with this program: https://github.com/lano1106/io_uring_udp_ping and the result is: Without sqpoll: NAPI busy loop disabled: rtt min/avg/max/mdev = 40.631/42.050/58.667/1.547 us NAPI busy loop enabled: rtt min/avg/max/mdev = 30.619/31.753/61.433/1.456 us With sqpoll: NAPI busy loop disabled: rtt min/avg/max/mdev = 42.087/44.438/59.508/1.533 us NAPI busy loop enabled: rtt min/avg/max/mdev = 35.779/37.347/52.201/0.924 us v2: * Evaluate list_empty(&ctx->napi_list) outside io_napi_busy_loop() to keep __io_sq_thread() execution as fast as possible * In io_cqring_wait(), move up the sig block to avoid needless computation if the block exits the function * In io_cqring_wait(), protect ctx->napi_list from race condition by splicing it into a local list * In io_cqring_wait(), allow busy polling when uts is missing * Fix kernel test robot issues v3: * Fix do_div() type mismatch warning * Reduce uring_lock contention by creating a spinlock for protecting napi_list * Support correctly MULTISHOT poll requests v4: * Put back benchmark result in commit text v5: * Protect napi_list from concurrent access from io_workers threads Olivier Langlois (2): io_uring: minor io_cqring_wait() optimization io_uring: Add support for napi_busy_poll fs/io_uring.c | 248 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 239 insertions(+), 9 deletions(-) -- 2.35.1