Re: very inaccurate %util of iostat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 24, 2020 at 11:54 AM Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
>
> On Mon, Mar 23 2020 at 11:19pm -0400,
> Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>
> > Hi Guys,
> >
> > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
> > changes calculation of 'io_ticks' a lot.
> >
> > In theory, io_ticks counts the time when there is any IO in-flight or in-queue,
> > so it has to rely on in-flight counting of IO.
> >
> > However, commit 5b18b5a73760 changes io_ticks's accounting into the
> > following way:
> >
> >       stamp = READ_ONCE(part->stamp);
> >       if (unlikely(stamp != now)) {
> >               if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
> >                       __part_stat_add(part, io_ticks, 1);
> >       }
> >
> > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp
> > changes compared with previous stamp, no matter if there is any in-flight
> > IO or not.
> >
> > Now when there is very heavy IO on disks, %util is still much less than
> > 100%, especially on HDD, the reason could be that IO latency can be much more
> > than 1ms in case of 1000HZ, so the above calculation is very inaccurate.
> >
> > Another extreme example is that if IOs take long time to complete, such
> > as IO stall, %util may show 0% utilization, instead of 100%.
>
> Hi Ming,
>
> Your email triggered a memory of someone else (Konstantin Khlebnikov)
> having reported and fixed this relatively recently, please see this
> patchset: https://lkml.org/lkml/2020/3/2/336
>
> Obviously this needs fixing.  If you have time to review/polish the
> proposed patches that'd be great.
>
> Mike
>

Hi,

commit 5b18b5a73760   makes io.util larger than the real, when IO
inflight count <= 1,
even with the commit 2b8bd423614, the problem is exist too.

static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
{
        unsigned long stamp;
again:
        stamp = READ_ONCE(part->stamp);
        if (unlikely(stamp != now)) {
                if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
                        __part_stat_add(part, io_ticks, end ? now - stamp : 1);
        }

when a new start, blk_account_io_start => update_io_ticks and add 1 jiffy to
io_ticks, even there is no IO before, so it will always add an extra 1 jiffy.
So we should know is there any inflight IO before.



Before commit 5b18b5a73760,
The io_ticks will not be added, if there is no inflight when start a new IO.
static void part_round_stats_single(struct request_queue *q,
                                    struct hd_struct *part, unsigned long now,
                                    unsigned int inflight)
{
        if (inflight) {
                __part_stat_add(part, time_in_queue,
                                inflight * (now - part->stamp));
                __part_stat_add(part, io_ticks, (now - part->stamp));
        }
        part->stamp = now;
}


Reproduce:
fio -name=test -ioengine=sync -bs=4K -rw=write
-filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
-runtime=300 -rate=2m,2m



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux