On Thu, Feb 13, 2025 at 03:12:43PM +0000, lizetao wrote: > I tested this patch set. When I use null as the device, the test results are like your v1. > When the bs is 4k, there is a slight improvement; when the bs is 64k, there is a significant improvement. > However, when I used loop as the device, I found that there was no improvement, whether using 4k or 64k. As follow: > > ublk add -t loop -f ./ublk-loop.img > ublk add -t loop -f ./ublk-loop-zerocopy.img > > fio -filename=/dev/ublkb0 -direct=1 -rw=read -iodepth=1 -ioengine=io_uring -bs=128k -size=5G > read: IOPS=2015, BW=126MiB/s (132MB/s)(1260MiB/10005msec) > > fio -filename=/dev/ublkb1 -direct=1 -rw=read -iodepth=1 -ioengine=io_uring -bs=128k -size=5G > read: IOPS=1998, BW=125MiB/s (131MB/s)(1250MiB/10005msec) > > > So, this patch set is optimized for null type devices? Or if I've missed any key information, please let me know. What do you get if if you run your fio job directly on your ublk-loop.img file? Throughput should improve until you've saturated the backend device. Once you hit that point, the primary benefit of zero-copy come from decreased memory and CPU utilizations.