On 9/6/24 4:57 PM, Pavel Begunkov wrote: > There is an interest in having asynchronous block operations like > discard and write zeroes. The series implements that as io_uring commands, > which is an io_uring request type allowing to implement custom file > specific operations. > > First 4 are preparation patches. Patch 5 introduces the main chunk of > cmd infrastructure and discard commands. Patches 6-8 implement > write zeroes variants. Sitting in for-6.12/io_uring-discard for now, as there's a hidden dependency with the end/len patch in for-6.12/block. Ran a quick test - have 64 4k discards inflight. Here's the current performance, with 64 threads with sync discard: qd64 sync discard: 21K IOPS, lat avg 3 msec (max 21 msec) and using io_uring with async discard, otherwise same test case: qd64 async discard: 76K IOPS, lat avg 845 usec (max 2.2 msec) If we switch to doing 1M discards, then we get: qd64 sync discard: 14K IOPS, lat avg 5 msec (max 25 msec) and using io_uring with async discard, otherwise same test case: qd64 async discard: 56K IOPS, lat avg 1153 usec (max 3.6 msec) This is on a: Samsung Electronics Co Ltd NVMe SSD Controller PM174X nvme device. It doesn't have the fastest discard, but still nicely shows the improvement over a purely sync discard. -- Jens Axboe