On Mon, Oct 26, 2020 at 7:17 PM Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> wrote: > > 23.10.2020, 08:51, "Weiping Zhang" <zwp10758@xxxxxxxxx>: > > On Fri, Oct 23, 2020 at 10:48 AM Weiping Zhang <zwp10758@xxxxxxxxx> wrote: > > > On Tue, Mar 24, 2020 at 11:54 AM Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > > > > On Mon, Mar 23 2020 at 11:19pm -0400, > > Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > > Hi Guys, > > > > > > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting") > > > changes calculation of 'io_ticks' a lot. > > > > > > In theory, io_ticks counts the time when there is any IO in-flight or in-queue, > > > so it has to rely on in-flight counting of IO. > > > > > > However, commit 5b18b5a73760 changes io_ticks's accounting into the > > > following way: > > > > > > stamp = READ_ONCE(part->stamp); > > > if (unlikely(stamp != now)) { > > > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) > > > __part_stat_add(part, io_ticks, 1); > > > } > > > > > > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp > > > changes compared with previous stamp, no matter if there is any in-flight > > > IO or not. > > > > > > Now when there is very heavy IO on disks, %util is still much less than > > > 100%, especially on HDD, the reason could be that IO latency can be much more > > > than 1ms in case of 1000HZ, so the above calculation is very inaccurate. > > > > > > Another extreme example is that if IOs take long time to complete, such > > > as IO stall, %util may show 0% utilization, instead of 100%. > > > > Hi Ming, > > > > Your email triggered a memory of someone else (Konstantin Khlebnikov) > > having reported and fixed this relatively recently, please see this > > patchset: https://lkml.org/lkml/2020/3/2/336 > > > > Obviously this needs fixing. If you have time to review/polish the > > proposed patches that'd be great. > > > > Mike > > > > Hi, > > commit 5b18b5a73760 makes io.util larger than the real, when IO > inflight count <= 1, > even with the commit 2b8bd423614, the problem is exist too. > > static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end) > { > unsigned long stamp; > again: > stamp = READ_ONCE(part->stamp); > if (unlikely(stamp != now)) { > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) > __part_stat_add(part, io_ticks, end ? now - stamp : 1); > } > > when a new start, blk_account_io_start => update_io_ticks and add 1 jiffy to > io_ticks, even there is no IO before, so it will always add an extra 1 jiffy. > So we should know is there any inflight IO before. > > > > Before commit 5b18b5a73760, > The io_ticks will not be added, if there is no inflight when start a new IO. > static void part_round_stats_single(struct request_queue *q, > struct hd_struct *part, unsigned long now, > unsigned int inflight) > { > if (inflight) { > __part_stat_add(part, time_in_queue, > inflight * (now - part->stamp)); > __part_stat_add(part, io_ticks, (now - part->stamp)); > } > part->stamp = now; > } > > > Reproduce: > fio -name=test -ioengine=sync -bs=4K -rw=write > -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1 > -runtime=300 -rate=2m,2m > > > > > Let's explain in more detail. > > I run the following command on a host, with different kernel version. > > fio -name=test -ioengine=sync -bs=4K -rw=write > -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1 > -runtime=300 -rate=2m,2m > > If we run fio in a sync direct io mode, IO will be proccessed one by one, > you can see that there are 512 IOs completed in one second. > > > kernel: 4.19.0 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 512.00 0.00 2.00 > 8.00 0.21 0.40 0.00 0.40 0.40 20.60 > > The averate io.latency is 0.4ms > So the disk time cost in one second should be 0.4 * 512 = 204.8 ms, > that means, %util should be 20%. > > > kernel: commit f9893351ac > In the latest kernel commit f9893351ac, I got the follow number, > Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the > io.latency io.latency will be 1 + 0.4 = 1.4ms, > 1.4 * 512 = 716.8ms, so the %util show it about 72%. > > Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s > %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util > vda 0.00 512.00 0.00 2.00 0.00 0.00 > 0.00 0.00 0.00 0.40 0.20 0.00 4.00 1.41 72.10 > > There is a big gap for %util. > > > Yep. Meaning of this statistics has changed. Now it approximates count of jiffies where was running at last one request. I.e. every tiny request always accounts whole jiffy were it started/finished. > When Q2Q >= 1 jiffy, an extra 1jiffy was added when IO start, it's really not reasonable, modprobe null_blk submit_queues=8 queue_mode=2 irqmode=2 completion_nsec=100000 fio -name=test -ioengine=sync -bs=4K -rw=write -filename=/dev/nullb0 -size=100M -time_based=1 -direct=1 -runtime=300 -rate=4m & w/s w_await %util ----------------------------------------------- before patch 1024 0.15 100 after patch 1024 0.15 14.5 The gap is really unacceptable. The test patch: https://github.com/dublio/linux/commits/fix_util_v5 > %util is still useful - it shows how disk activity is grouped across time. > > You could get precise "utilization" in form of average queue depth from read/write time. It's hard to get the percent of overlap each other for inflight IOs when device has high queue depth.