Hi, > -----Original Message----- > From: Keith Busch <kbusch@xxxxxxxx> > Sent: Tuesday, February 11, 2025 8:57 AM > To: ming.lei@xxxxxxxxxx; asml.silence@xxxxxxxxx; axboe@xxxxxxxxx; linux- > block@xxxxxxxxxxxxxxx; io-uring@xxxxxxxxxxxxxxx > Cc: bernd@xxxxxxxxxxx; Keith Busch <kbusch@xxxxxxxxxx> > Subject: [PATCHv2 0/6] ublk zero-copy support > > From: Keith Busch <kbusch@xxxxxxxxxx> > > Previous version was discussed here: > > https://lore.kernel.org/linux-block/20250203154517.937623-1- > kbusch@xxxxxxxx/ > > The same ublksrv reference code in that link was used to test the kernel side > changes. > > Before listing what has changed, I want to mention what is the same: the > reliance on the ring ctx lock to serialize the register ahead of any use. I'm not > ignoring the feedback; I just don't have a solid answer right now, and want to > progress on the other fronts in the meantime. > > Here's what's different from the previous: > > - Introduced an optional 'release' callback when the resource node is > no longer referenced. The callback addresses any buggy applications > that may complete their request and unregister their index while IO > is in flight. This obviates any need to take extra page references > since it prevents the request from completing. > > - Removed peeking into the io_cache element size and instead use a > more intuitive bvec segment count limit to decide if we're caching > the imu (suggested by Pavel). > > - Dropped the const request changes; it's not needed. I tested this patch set. When I use null as the device, the test results are like your v1. When the bs is 4k, there is a slight improvement; when the bs is 64k, there is a significant improvement. However, when I used loop as the device, I found that there was no improvement, whether using 4k or 64k. As follow: ublk add -t loop -f ./ublk-loop.img ublk add -t loop -f ./ublk-loop-zerocopy.img fio -filename=/dev/ublkb0 -direct=1 -rw=read -iodepth=1 -ioengine=io_uring -bs=128k -size=5G read: IOPS=2015, BW=126MiB/s (132MB/s)(1260MiB/10005msec) fio -filename=/dev/ublkb1 -direct=1 -rw=read -iodepth=1 -ioengine=io_uring -bs=128k -size=5G read: IOPS=1998, BW=125MiB/s (131MB/s)(1250MiB/10005msec) So, this patch set is optimized for null type devices? Or if I've missed any key information, please let me know. --- Li Zetao