Re: very inaccurate %util of iostat

Weiping Zhang <zwp10758@xxxxxxxxx> · Fri, 23 Oct 2020 13:50:49 +0800

On Fri, Oct 23, 2020 at 10:48 AM Weiping Zhang <zwp10758@xxxxxxxxx> wrote:
>
> On Tue, Mar 24, 2020 at 11:54 AM Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> >
> > On Mon, Mar 23 2020 at 11:19pm -0400,
> > Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> >
> > > Hi Guys,
> > >
> > > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
> > > changes calculation of 'io_ticks' a lot.
> > >
> > > In theory, io_ticks counts the time when there is any IO in-flight or in-queue,
> > > so it has to rely on in-flight counting of IO.
> > >
> > > However, commit 5b18b5a73760 changes io_ticks's accounting into the
> > > following way:
> > >
> > >       stamp = READ_ONCE(part->stamp);
> > >       if (unlikely(stamp != now)) {
> > >               if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
> > >                       __part_stat_add(part, io_ticks, 1);
> > >       }
> > >
> > > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp
> > > changes compared with previous stamp, no matter if there is any in-flight
> > > IO or not.
> > >
> > > Now when there is very heavy IO on disks, %util is still much less than
> > > 100%, especially on HDD, the reason could be that IO latency can be much more
> > > than 1ms in case of 1000HZ, so the above calculation is very inaccurate.
> > >
> > > Another extreme example is that if IOs take long time to complete, such
> > > as IO stall, %util may show 0% utilization, instead of 100%.
> >
> > Hi Ming,
> >
> > Your email triggered a memory of someone else (Konstantin Khlebnikov)
> > having reported and fixed this relatively recently, please see this
> > patchset: https://lkml.org/lkml/2020/3/2/336
> >
> > Obviously this needs fixing.  If you have time to review/polish the
> > proposed patches that'd be great.
> >
> > Mike
> >
>
> Hi,
>
> commit 5b18b5a73760   makes io.util larger than the real, when IO
> inflight count <= 1,
> even with the commit 2b8bd423614, the problem is exist too.
>
> static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
> {
>         unsigned long stamp;
> again:
>         stamp = READ_ONCE(part->stamp);
>         if (unlikely(stamp != now)) {
>                 if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
>                         __part_stat_add(part, io_ticks, end ? now - stamp : 1);
>         }
>
> when a new start, blk_account_io_start => update_io_ticks and add 1 jiffy to
> io_ticks, even there is no IO before, so it will always add an extra 1 jiffy.
> So we should know is there any inflight IO before.
>
>
>
> Before commit 5b18b5a73760,
> The io_ticks will not be added, if there is no inflight when start a new IO.
> static void part_round_stats_single(struct request_queue *q,
>                                     struct hd_struct *part, unsigned long now,
>                                     unsigned int inflight)
> {
>         if (inflight) {
>                 __part_stat_add(part, time_in_queue,
>                                 inflight * (now - part->stamp));
>                 __part_stat_add(part, io_ticks, (now - part->stamp));
>         }
>         part->stamp = now;
> }
>
>
> Reproduce:
> fio -name=test -ioengine=sync -bs=4K -rw=write
> -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
> -runtime=300 -rate=2m,2m

Let's explain in more detail.

I run the following command on a host, with different kernel version.

fio -name=test -ioengine=sync -bs=4K -rw=write
-filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
-runtime=300 -rate=2m,2m

If we run fio in a sync direct io mode, IO will be proccessed one by one,
you can see that there are 512 IOs completed in one second.

kernel: 4.19.0

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00  512.00     0.00     2.00
8.00     0.21    0.40    0.00    0.40   0.40  20.60

The averate io.latency is 0.4ms
So the disk time cost in one second should be 0.4 * 512 = 204.8 ms,
that means, %util should be 20%.

kernel: commit f9893351ac
In the latest kernel commit f9893351ac, I got the follow number,
Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the
io.latency io.latency will be 1 + 0.4 = 1.4ms,
1.4 * 512 = 716.8ms, so the %util show it about 72%.

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vda              0.00  512.00      0.00      2.00     0.00     0.00
0.00   0.00    0.00    0.40   0.20     0.00     4.00   1.41  72.10

There is a big gap for %util.