Hi, I have ran all io pattern on my another host. Notes: q1 means fio iodepth = 1 j1 means fio num jobs = 1 VCPU = 4, VMEM = 2GiB, fio using directio. The results of most jobs are better than the deadline, and some are lower than the deadline。 pattern | iops(mq-deadline) | iops(none) | diff :- | -: | -: | -: 4k-randread-q1-j1 | 12325 | 13356 |8.37% 256k-randread-q1-j1 | 1865 | 1883 |0.97% 4k-randread-q128-j1 | 204739 | 319066 |55.84% 256k-randread-q128-j1 | 24257 | 22851 |-5.80% 4k-randwrite-q1-j1 | 9923 | 10163 |2.42% 256k-randwrite-q1-j1 | 2762 | 2833 |2.57% 4k-randwrite-q128-j1 | 137400 | 152081 |10.68% 256k-randwrite-q128-j1 | 9353 | 9233 |-1.28% 4k-read-q1-j1 | 21499 | 22223 |3.37% 256k-read-q1-j1 | 1919 | 1951 |1.67% 4k-read-q128-j1 | 158806 | 345269 |117.42% 256k-read-q128-j1 | 18918 | 23710 |25.33% 4k-write-q1-j1 | 10120 | 10262 |1.40% 256k-write-q1-j1 | 2779 | 2744 |-1.26% 4k-write-q128-j1 | 47576 | 209236 |339.79% 256k-write-q128-j1 | 9199 | 9337 |1.50% 4k-randread-q1-j2 | 24238 | 25478 |5.12% 256k-randread-q1-j2 | 3656 | 3649 |-0.19% 4k-randread-q128-j2 | 390090 | 577300 |47.99% 256k-randread-q128-j2 | 21992 | 23437 |6.57% 4k-randwrite-q1-j2 | 17096 | 18112 |5.94% 256k-randwrite-q1-j2 | 5188 | 4914 |-5.28% 4k-randwrite-q128-j2 | 143373 | 140560 |-1.96% 256k-randwrite-q128-j2 | 9423 | 9314 |-1.16% 4k-read-q1-j2 | 36890 | 31768 |-13.88% 256k-read-q1-j2 | 3708 | 4028 |8.63% 4k-read-q128-j2 | 399500 | 409857 |2.59% 256k-read-q128-j2 | 19360 | 21467 |10.88% 4k-write-q1-j2 | 17786 | 18519 |4.12% 256k-write-q1-j2 | 4756 | 5035 |5.87% 4k-write-q128-j2 | 175756 | 159109 |-9.47% 256k-write-q128-j2 | 9292 | 9293 |0.01% > On Dec 8, 2023, at 11:54, Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > On Thu, Dec 07, 2023 at 07:44:37PM -0700, Keith Busch wrote: >> On Fri, Dec 08, 2023 at 10:00:36AM +0800, Ming Lei wrote: >>> On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote: >>>> virtio-blk is generally used in cloud computing scenarios, where the >>>> performance of virtual disks is very important. The mq-deadline scheduler >>>> has a big performance drop compared to none with single queue. In my tests, >>>> mq-deadline 4k readread iops were 270k compared to 450k for none. So here >>>> the default scheduler of virtio-blk is set to "none". >>> >>> The test result shows you may not test HDD. backing of virtio-blk. >>> >>> none can lose IO merge capability more or less, so probably sequential IO perf >>> drops in case of HDD backing. >> >> More of a curiosity, as I don't immediately even have an HDD to test >> with! Isn't it more useful for the host providing the backing HDD use an >> appropriate IO scheduler? virtio-blk has similiarities with a stacking >> block driver, and we usually don't need to stack IO schedulers. > > dm-rq actually uses IO scheduler at high layer, and early merge has some > benefits: > > 1) virtio-blk inflight requests are reduced, so less chance to throttle > inside VM, meantime less IOs(bigger size) are handled by QEMU, and submitted > to host side queue. > > 2) early merge in VM is cheap than host side, since there can be more block > IOs originated from different virtio-blk/scsi devices at the same time and > all images can be stored in single disk, then these IOs become interleaved in > host side queue, so sequential IO may become random or hard to merge. > > As Jens mentioned, it needs actual test. > > > Thanks, > Ming >