Hi Harshit Mogalapalli What is the queue_depth of queue of your storage device? In the same test conditions, what are the the results of sequential reading? Thanks! -----邮件原件----- 发件人: Harshit Mogalapalli <harshit.m.mogalapalli@xxxxxxxxxx> 发送时间: 2024年3月7日 2:46 收件人: 牛志国 (Zhiguo Niu) <Zhiguo.Niu@xxxxxxxxxx>; bvanassche@xxxxxxx; Jens Axboe <axboe@xxxxxxxxx>; linux-block@xxxxxxxxxxxxxxx 抄送: LKML <linux-kernel@xxxxxxxxxxxxxxx>; Ramanan Govindarajan <ramanan.govindarajan@xxxxxxxxxx>; Paul Webb <paul.x.webb@xxxxxxxxxx>; nicky.veitch@xxxxxxxxxx 主题: [bug-report] Performance regression with fio sequential-write on a multipath setup. 注意: 这封邮件来自于外部。除非你确定邮件内容安全,否则不要点击任何链接和附件。 CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi, We have noticed a performance regression in kernel with fio sequential write job. Notes and observations: ====================== 1. This is observed on recent kernels(6.6) when compared with 5.15.y, the bisection points to commit d47f9717e5cf ("block/mq-deadline: use correct way to throttling write requests") 2. Reverting the above commit improves the performance. 3. This regression can also be seen on 6.8-rc7 and a revert on top of that fixes the regression. 4. The commit looks very much related to the cause of regression. 5. Note that this happens only with multi-path setup even with 2 block devices. Test details: ============ (A) fio.write job fio-3.19 -- fio version [global] ioengine=libaio rw=write bs=128k iodepth=64 numjobs=24 direct=1 fsync=1 runtime=600 group_reporting [job] filename=/dev/dm-0 [job] filename=/dev/dm-1 Each disk is of 600G size. (B) Test results 6.8-rc7: 2 block devices with multi-path ------- job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ... job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ... fio-3.19 Starting 48 processes job: (groupid=0, jobs=48): err= 0: pid=6164: Wed Mar 6 17:58:33 2024 write: IOPS=1884, BW=236MiB/s (247MB/s)(138GiB/600319msec); 0 zone resets slat (usec): min=2, max=540462, avg=25445.35, stdev=24181.85 clat (msec): min=9, max=4941, avg=1602.56, stdev=339.05 lat (msec): min=9, max=4973, avg=1628.00, stdev=342.19 clat percentiles (msec): | 1.00th=[ 986], 5.00th=[ 1167], 10.00th=[ 1250], 20.00th=[ 1368], | 30.00th=[ 1435], 40.00th=[ 1502], 50.00th=[ 1569], 60.00th=[ 1636], | 70.00th=[ 1703], 80.00th=[ 1804], 90.00th=[ 1955], 95.00th=[ 2140], | 99.00th=[ 2869], 99.50th=[ 3239], 99.90th=[ 3842], 99.95th=[ 4010], | 99.99th=[ 4329] bw ( KiB/s): min=47229, max=516492, per=100.00%, avg=241546.47, stdev=1326.92, samples=57259 iops : min= 322, max= 3996, avg=1843.17, stdev=10.39, samples=57259 lat (msec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=0.02% lat (msec) : 500=0.06%, 750=0.14%, 1000=0.93%, 2000=90.41%, >=2000=8.42% fsync/fdatasync/sync_file_range: sync (nsec): min=10, max=57940, avg=104.23, stdev=498.86 sync percentiles (nsec): | 1.00th=[ 13], 5.00th=[ 19], 10.00th=[ 26], 20.00th=[ 61], | 30.00th=[ 68], 40.00th=[ 72], 50.00th=[ 75], 60.00th=[ 78], | 70.00th=[ 87], 80.00th=[ 167], 90.00th=[ 175], 95.00th=[ 177], | 99.00th=[ 221], 99.50th=[ 231], 99.90th=[ 318], 99.95th=[15680], | 99.99th=[17792] cpu : usr=0.08%, sys=0.16%, ctx=1096948, majf=0, minf=1995 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=199.5% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=0,1131018,0,1127994 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s), io=138GiB (148GB), run=600319-600319msec Disk stats (read/write): dm-0: ios=50/533034, merge=0/27056, ticks=16/113070163, in_queue=113070180, util=100.00%, aggrios=43/266595, aggrmerge=0/0, aggrticks=156/56542549, aggrin_queue=56542706, aggrutil=100.00% sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% sde: ios=86/533191, merge=0/0, ticks=313/113085099, in_queue=113085413, util=100.00% dm-1: ios=5/534381, merge=0/36389, ticks=240/113110344, in_queue=113110584, util=100.00%, aggrios=7/267191, aggrmerge=0/0, aggrticks=153/56543654, aggrin_queue=56543807, aggrutil=100.00% sdf: ios=14/534382, merge=0/0, ticks=306/113087308, in_queue=113087615, util=100.00% sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% Throughput Results: WRITE:247:1884:0 6.8-rc7+ Revert : 2 block devices with multi-path ------- job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ... job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ... fio-3.19 Starting 48 processes job: (groupid=0, jobs=48): err= 0: pid=6104: Wed Mar 6 18:29:13 2024 write: IOPS=2518, BW=315MiB/s (330MB/s)(185GiB/600339msec); 0 zone resets slat (usec): min=2, max=923472, avg=6789.22, stdev=20329.20 clat (msec): min=4, max=6020, avg=1212.68, stdev=714.90 lat (msec): min=4, max=6020, avg=1219.47, stdev=718.40 clat percentiles (msec): | 1.00th=[ 203], 5.00th=[ 309], 10.00th=[ 384], 20.00th=[ 535], | 30.00th=[ 709], 40.00th=[ 911], 50.00th=[ 1133], 60.00th=[ 1334], | 70.00th=[ 1519], 80.00th=[ 1754], 90.00th=[ 2198], 95.00th=[ 2601], | 99.00th=[ 3171], 99.50th=[ 3608], 99.90th=[ 4329], 99.95th=[ 4597], | 99.99th=[ 5134] bw ( KiB/s): min=12237, max=1834896, per=100.00%, avg=413187.52, stdev=6322.04, samples=44948 iops : min= 48, max=14314, avg=3186.68, stdev=49.49, samples=44948 lat (msec) : 10=0.01%, 20=0.01%, 50=0.09%, 100=0.02%, 250=2.28% lat (msec) : 500=15.45%, 750=14.26%, 1000=11.83%, 2000=42.52%, >=2000=13.55% fsync/fdatasync/sync_file_range: sync (nsec): min=10, max=76066, avg=57.85, stdev=299.52 sync percentiles (nsec): | 1.00th=[ 13], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 16], | 30.00th=[ 17], 40.00th=[ 20], 50.00th=[ 28], 60.00th=[ 47], | 70.00th=[ 65], 80.00th=[ 80], 90.00th=[ 103], 95.00th=[ 175], | 99.00th=[ 237], 99.50th=[ 241], 99.90th=[ 262], 99.95th=[ 318], | 99.99th=[16512] cpu : usr=0.06%, sys=0.07%, ctx=531434, majf=0, minf=728 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=199.6% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=0,1511918,0,1508894 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: bw=315MiB/s (330MB/s), 315MiB/s-315MiB/s (330MB/s-330MB/s), io=185GiB (198GB), run=600339-600339msec Disk stats (read/write): dm-0: ios=0/246318, merge=0/493981, ticks=0/142584585, in_queue=142584586, util=99.17%, aggrios=6/181454, aggrmerge=0/0, aggrticks=112/70608689, aggrin_queue=70608801, aggrutil=84.92% sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% sde: ios=12/362908, merge=0/0, ticks=224/141217379, in_queue=141217603, util=84.92% dm-1: ios=0/233211, merge=0/538097, ticks=0/142579042, in_queue=142579043, util=99.15%, aggrios=8/174475, aggrmerge=0/0, aggrticks=128/70654686, aggrin_queue=70654814, aggrutil=85.20% sdf: ios=16/348951, merge=0/0, ticks=256/141309372, in_queue=141309628, util=85.20% sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% Throughput Results: WRITE:330:2518:0 (C) performance difference: That is roughly a 33.65% performance change, this is reproducible on higher number of block devices as well. Thanks to Paul Webb for identifying this regression and sharing the details. We will be happy to test any patches to check the change in performance and also follow any suggestions. Thanks, Harshit