On 9/9/24 8:51 AM, Jens Axboe wrote: > On 9/6/24 4:57 PM, Pavel Begunkov wrote: >> There is an interest in having asynchronous block operations like >> discard and write zeroes. The series implements that as io_uring commands, >> which is an io_uring request type allowing to implement custom file >> specific operations. >> >> First 4 are preparation patches. Patch 5 introduces the main chunk of >> cmd infrastructure and discard commands. Patches 6-8 implement >> write zeroes variants. > > Sitting in for-6.12/io_uring-discard for now, as there's a hidden > dependency with the end/len patch in for-6.12/block. > > Ran a quick test - have 64 4k discards inflight. Here's the current > performance, with 64 threads with sync discard: > > qd64 sync discard: 21K IOPS, lat avg 3 msec (max 21 msec) > > and using io_uring with async discard, otherwise same test case: > > qd64 async discard: 76K IOPS, lat avg 845 usec (max 2.2 msec) > > If we switch to doing 1M discards, then we get: > > qd64 sync discard: 14K IOPS, lat avg 5 msec (max 25 msec) > > and using io_uring with async discard, otherwise same test case: > > qd64 async discard: 56K IOPS, lat avg 1153 usec (max 3.6 msec) > > This is on a: > > Samsung Electronics Co Ltd NVMe SSD Controller PM174X > > nvme device. It doesn't have the fastest discard, but still nicely shows > the improvement over a purely sync discard. Did some basic testing with null_blk just to get a better idea of what it'd look like on a faster devices. Same test cases as above (qd=64, 4k and 1M random trims): Type Trim size IOPS Lat avg (usec) Lat Max (usec) ============================================================== sync 4k 144K 444 20314 async 4k 1353K 47 595 sync 1M 56K 1136 21031 async 1M 94K 680 760 -- Jens Axboe