On 8/15/24 16:50, Jens Axboe wrote:
On 8/14/24 4:45 AM, Pavel Begunkov wrote:
There is an interest in having asynchronous block operations like
discard. The patch set implements that as io_uring commands, which is
an io_uring request type allowing to implement custom file specific
operations.
First 4 patches are simple preps, and the main part is in Patch 5.
Not tested with a real drive yet, hence sending as an RFC.
I'm also going to add BLKDISCARDZEROES and BLKSECDISCARD, which should
reuse structures and helpers from Patch 5.
liburing tests for reference:
https://github.com/isilence/liburing.git discard-cmd-test
FWIW, did a quick patch to wire it up for fio. Using this
job file:
[trim]
filename=/dev/nvme31n1
ioengine=io_uring
iodepth=64
rw=randtrim
norandommap
bs=4k
the stock kernel gets:
trim: IOPS=21.6k, BW=84.4MiB/s (88.5MB/s)(847MiB/10036msec); 0 zone resets
using ~5% CPU, and with the process predictably stuck in D state all of
the time, waiting on a sync trim.
With the patches:
trim: IOPS=75.8k, BW=296MiB/s (310MB/s)(2653MiB/8961msec); 0 zone resets
using ~11% CPU.
Thanks for giving it a run. Can be further improved for particular use cases,
e.g. by adding a vectored version as an additional feature, i.e. multiple
discard ranges per request, and something like retry of short IO, but for
that we'd want getting -EAGAIN directly from submit_bio unlike failing via
the callback.
Didn't verify actual functionality, but did check trims are going up and
down. Drive used is:
Dell NVMe PM1743 RI E3.S 7.68TB
Outside of async trim being useful for actual workloads rather than the
sync trim we have now, it'll also help characterizing real world mixed
workloads that have trims with reads/writes.
--
Pavel Begunkov