This saves one mutex lock/unlock pair per syscall when users do submit + getevents. Perf tells that for QD1 iopoll this patch reduces overhead on locking from ~4.3% to ~2.6%, iow cuts 1.3% - 1.9% of CPU time. Something similar I see in final throughput. It's a good win for smaller QD, especially considering that io_uring only takes about 20-30% of all cycles, the rest goes to syscalling, the block layer and below. Pavel Begunkov (3): io_uring: split off IOPOLL argument verifiction io_uring: pre-calculate syscall iopolling decision io_uring: optimise mutex locking for submit+iopoll fs/io_uring.c | 86 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 56 insertions(+), 30 deletions(-) -- 2.35.1