Ping On Wed, Nov 18, 2020 at 1:55 PM Weiping Zhang <zwp10758@xxxxxxxxx> wrote: > > On Tue, Nov 17, 2020 at 3:40 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > On Tue, Nov 17, 2020 at 12:59:46PM +0800, Weiping Zhang wrote: > > > On Tue, Nov 17, 2020 at 11:28 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > > > > > On Tue, Nov 17, 2020 at 11:01:49AM +0800, Weiping Zhang wrote: > > > > > Hi Jens, > > > > > > > > > > Ping > > > > > > > > Hello Weiping, > > > > > > > > Not sure we have to fix this issue, and adding blk_mq_queue_inflight() > > > > back to IO path brings cost which turns out to be visible, and I did > > > > get soft lockup report on Azure NVMe because of this kind of cost. > > > > > > > Have you test v5, this patch is different from v1, the v1 gets > > > inflight for each IO, > > > v5 has changed to get inflight every jiffer. > > > > I meant the issue can be reproduced on kernel before 5b18b5a73760("block: > > delete part_round_stats and switch to less precise counting"). > > > > Also do we really need to fix this issue? I understand device > > utilization becomes not accurate at very small load, is it really > > worth of adding runtime load in fast path for fixing this issue? > > > Hello Ming, > > The problem is user hard to know how busy disk is, > for small load, it shows high utilization, for heavy load it also shows > high utilization, that makes %util meaningless. > > The following test case shows a big gap with same workload: > > modprobe null_blk submit_queues=8 queue_mode=2 irqmode=2 completion_nsec=100000 > fio -name=test -ioengine=sync -bs=4K -rw=write -filename=/dev/nullb0 > -size=100M -time_based=1 -direct=1 -runtime=300 -rate=4m & > > w/s w_await %util > ----------------------------------------------- > before patch 1024 0.15 100 > after patch 1024 0.15 14.5 > > I know for hyper speed disk, add such accounting in fast path is harmful, > maybe we add an interface to enable/disable io_ticks accounting, like > what /sys/block/<disk>/queue/iostat does. > > eg: /sys/block/<disk>/queue/iostat_io_ticks > when write 0 to it, just disable io_ticks totally. > > Or any other good idea ? > > > > > > > If for v5, can we reproduce it on null_blk ? > > > > No, I just saw report on Azure NVMe. > > > > > > > > > BTW, suppose the io accounting issue needs to be fixed, just wondering > > > > why not simply revert 5b18b5a73760 ("block: delete part_round_stats and > > > > switch to less precise counting"), and the original way had been worked > > > > for decades. > > > > > > > This patch is more better than before, it will break early when find there is > > > inflight io on any cpu, for the worst case(the io in running on the last cpu), > > > it iterates all cpus. > > > Yes, it's the worst case. > Actually v5 has two improvements compare to before 5b18b5a73760: > 1. for io end, v5 do not get inflight count > 2. for io start, v5 just find the first inflight io in any cpu, for > the worst case it does same as before. > > > Please see the following case: > > > > 1) one device has 256 hw queues, and the system has 256 cpu cores, and > > each hw queue's depth is 1k. > > > > 2) there isn't any io load on CPUs(0 ~ 254) > > > > 3) heavy io load is run on CPU 255 > > > > So with your trick the code still need to iterate hw queues from 0 to 254, and > > the load isn't something which can be ignored. Especially it is just for > > io accounting. > > > > > > Thanks, > > Ming > > > Thanks