Re: [PATCHv2 0/6] ublk zero-copy support

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 14 Feb 2025 10:41:21 +0800

On Thu, Feb 13, 2025 at 11:24 PM lizetao <lizetao1@xxxxxxxxxx> wrote:
>
> Hi,
>
> > -----Original Message-----
> > From: Keith Busch <kbusch@xxxxxxxx>
> > Sent: Tuesday, February 11, 2025 8:57 AM
> > To: ming.lei@xxxxxxxxxx; asml.silence@xxxxxxxxx; axboe@xxxxxxxxx; linux-
> > block@xxxxxxxxxxxxxxx; io-uring@xxxxxxxxxxxxxxx
> > Cc: bernd@xxxxxxxxxxx; Keith Busch <kbusch@xxxxxxxxxx>
> > Subject: [PATCHv2 0/6] ublk zero-copy support
> >
> > From: Keith Busch <kbusch@xxxxxxxxxx>
> >
> > Previous version was discussed here:
> >
> >   https://lore.kernel.org/linux-block/20250203154517.937623-1-
> > kbusch@xxxxxxxx/
> >
> > The same ublksrv reference code in that link was used to test the kernel side
> > changes.
> >
> > Before listing what has changed, I want to mention what is the same: the
> > reliance on the ring ctx lock to serialize the register ahead of any use. I'm not
> > ignoring the feedback; I just don't have a solid answer right now, and want to
> > progress on the other fronts in the meantime.
> >
> > Here's what's different from the previous:
> >
> >  - Introduced an optional 'release' callback when the resource node is
> >    no longer referenced. The callback addresses any buggy applications
> >    that may complete their request and unregister their index while IO
> >    is in flight. This obviates any need to take extra page references
> >    since it prevents the request from completing.
> >
> >  - Removed peeking into the io_cache element size and instead use a
> >    more intuitive bvec segment count limit to decide if we're caching
> >    the imu (suggested by Pavel).
> >
> >  - Dropped the const request changes; it's not needed.
>
> I tested this patch set. When I use null as the device, the test results are like your v1.
> When the bs is 4k, there is a slight improvement; when the bs is 64k, there is a significant improvement.

Yes,  the improvement is usually more obvious with a big IO size(>= 64K).

> However, when I used loop as the device, I found that there was no improvement, whether using 4k or 64k. As follow:
>
>   ublk add -t loop -f ./ublk-loop.img
>   ublk add -t loop -f ./ublk-loop-zerocopy.img
>
>   fio -filename=/dev/ublkb0 -direct=1 -rw=read -iodepth=1 -ioengine=io_uring -bs=128k -size=5G
>     read: IOPS=2015, BW=126MiB/s (132MB/s)(1260MiB/10005msec)
>
>   fio -filename=/dev/ublkb1 -direct=1 -rw=read -iodepth=1 -ioengine=io_uring -bs=128k -size=5G
>     read: IOPS=1998, BW=125MiB/s (131MB/s)(1250MiB/10005msec)
>
>
> So, this patch set is optimized for null type devices? Or if I've missed any key information, please let me know.

Latency may have decreased a bit.

System sources can't be saturated in single queue depth, please run
the same test with
high queue depth per Keith's suggestion:

        --iodepth=128 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16

Also if you set up the backing file as ramfs image, the improvement
should be pretty
obvious, I observed IOPS doubled in this way.

Thanks,
Ming