Re: [PATCH v4] io_uring: reduce latency by reissueing the operation

Jens Axboe <axboe@xxxxxxxxx> · Thu, 24 Jun 2021 18:45:21 -0600

On 6/22/21 6:17 AM, Olivier Langlois wrote:
> It is quite frequent that when an operation fails and returns EAGAIN,
> the data becomes available between that failure and the call to
> vfs_poll() done by io_arm_poll_handler().
> 
> Detecting the situation and reissuing the operation is much faster
> than going ahead and push the operation to the io-wq.
> 
> Performance improvement testing has been performed with:
> Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.
> 
> 4 measurements have been taken:
> 1. The time it takes to process a read request when data is already available
> 2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available
> 3. The time it takes to execute io_queue_async_work()
> 4. The time it takes to complete a read request asynchronously
> 
> 2.25% of all the read operations did use the new path.
> 
> ready data (baseline)
> avg	3657.94182918628
> min	580
> max	20098
> stddev	1213.15975908162
> 
> reissue	completion
> average	7882.67567567568
> min	2316
> max	28811
> stddev	1982.79172973284
> 
> insert io-wq time
> average	8983.82276995305
> min	3324
> max	87816
> stddev	2551.60056552038
> 
> async time completion
> average	24670.4758861127
> min	10758
> max	102612
> stddev	3483.92416873804
> 
> Conclusion:
> On average reissuing the sqe with the patch code is 1.1uSec faster and
> in the worse case scenario 59uSec faster than placing the request on
> io-wq
> 
> On average completion time by reissuing the sqe with the patch code is
> 16.79uSec faster and in the worse case scenario 73.8uSec faster than
> async completion.

Thanks for respinning with a (much) better commit message. Applied.

-- 
Jens Axboe