On Fri, Dec 8, 2023 at 6:54 AM Li Feng <fengli@xxxxxxxxxx> wrote: > > > Hi, > > I have ran all io pattern on my another host. > Notes: > q1 means fio iodepth = 1 > j1 means fio num jobs = 1 > > VCPU = 4, VMEM = 2GiB, fio using directio. > > The results of most jobs are better than the deadline, and some are lower than the deadline。 I think this analysis is a bit simplistic. In particular: For low queue depth improvements are relatively small but we also have the worst cases: 4k-randread-q1-j1 | 12325 | 13356 | 8.37% 256k-randread-q1-j1 | 1865 | 1883 | 0.97% 4k-randwrite-q1-j1 | 9923 | 10163 | 2.42% 256k-randwrite-q1-j1 | 2762 | 2833 | 2.57% 4k-read-q1-j1 | 21499 | 22223 | 3.37% 256k-read-q1-j1 | 1919 | 1951 | 1.67% 4k-write-q1-j1 | 10120 | 10262 | 1.40% 256k-write-q1-j1 | 2779 | 2744 | -1.26% 4k-randread-q1-j2 | 24238 | 25478 | 5.12% 256k-randread-q1-j2 | 3656 | 3649 | -0.19% 4k-randwrite-q1-j2 | 17096 | 18112 | 5.94% 256k-randwrite-q1-j2 | 5188 | 4914 | -5.28% 4k-read-q1-j2 | 36890 | 31768 | -13.88% 256k-read-q1-j2 | 3708 | 4028 | 8.63% 4k-write-q1-j2 | 17786 | 18519 | 4.12% 256k-write-q1-j2 | 4756 | 5035 | 5.87% (I ran a paired t-test and it confirms that the improvements overall are not statistically significant). Small, high queue depth I/O is where the improvements are definitely significant, but even then the scheduler seems to help in the j2 case: 4k-randread-q128-j1 | 204739 | 319066 | 55.84% 4k-randwrite-q128-j1 | 137400 | 152081 | 10.68% 4k-read-q128-j1 | 158806 | 345269 | 117.42% 4k-write-q128-j1 | 47576 | 209236 | 339.79% 4k-randread-q128-j2 | 390090 | 577300 | 47.99% 4k-randwrite-q128-j2 | 143373 | 140560 | -1.96% 4k-read-q128-j2 | 399500 | 409857 | 2.59% 4k-write-q128-j2 | 175756 | 159109 | -9.47% At higher sizes, even high queue depth results have high variability. There are clear improvements for sequential reads, but not so much for everything else: 256k-randread-q128-j1 | 24257 | 22851 | -5.80% 256k-randwrite-q128-j1 | 9353 | 9233 | -1.28% 256k-read-q128-j1 | 18918 | 23710 | 25.33% 256k-write-q128-j1 | 9199 | 9337 | 1.50% 256k-randread-q128-j2 | 21992 | 23437 | 6.57% 256k-randwrite-q128-j2 | 9423 | 9314 | -1.16% 256k-read-q128-j2 | 19360 | 21467 | 10.88% 256k-write-q128-j2 | 9292 | 9293 | 0.01% I would focus on small I/O with varying queue depths, to understand at which point the performance starts to improve; queue depth of 128 may not be representative of common usage, especially high queue depth *sequential* access which is where the biggest effects are visibie. Maybe you can look at improvements in the scheduler instead? Paolo