Re: [PATCH v4 0/8] implement async block discards and other ops via io_uring

Jens Axboe <axboe@xxxxxxxxx> · Mon, 9 Sep 2024 09:33:47 -0600

On 9/9/24 8:51 AM, Jens Axboe wrote:
> On 9/6/24 4:57 PM, Pavel Begunkov wrote:
>> There is an interest in having asynchronous block operations like
>> discard and write zeroes. The series implements that as io_uring commands,
>> which is an io_uring request type allowing to implement custom file
>> specific operations.
>>
>> First 4 are preparation patches. Patch 5 introduces the main chunk of
>> cmd infrastructure and discard commands. Patches 6-8 implement
>> write zeroes variants.
> 
> Sitting in for-6.12/io_uring-discard for now, as there's a hidden
> dependency with the end/len patch in for-6.12/block.
> 
> Ran a quick test - have 64 4k discards inflight. Here's the current
> performance, with 64 threads with sync discard:
> 
> qd64 sync discard: 21K IOPS, lat avg 3 msec (max 21 msec)
> 
> and using io_uring with async discard, otherwise same test case:
> 
> qd64 async discard: 76K IOPS, lat avg 845 usec (max 2.2 msec)
> 
> If we switch to doing 1M discards, then we get:
> 
> qd64 sync discard: 14K IOPS, lat avg 5 msec (max 25 msec)
> 
> and using io_uring with async discard, otherwise same test case:
> 
> qd64 async discard: 56K IOPS, lat avg 1153 usec (max 3.6 msec)
> 
> This is on a:
> 
> Samsung Electronics Co Ltd NVMe SSD Controller PM174X
> 
> nvme device. It doesn't have the fastest discard, but still nicely shows
> the improvement over a purely sync discard.

Did some basic testing with null_blk just to get a better idea of what
it'd look like on a faster devices. Same test cases as above (qd=64, 4k
and 1M random trims):

Type	Trim size	IOPS	Lat avg (usec)	Lat Max (usec)
==============================================================
sync	4k		 144K	    444		   20314
async	4k		1353K	     47		     595
sync	1M		  56K	   1136		   21031
async	1M		  94K	    680		     760			

-- 
Jens Axboe