On Mon, Mar 23 2020 at 11:19pm -0400, Ming Lei <ming.lei@xxxxxxxxxx> wrote: > Hi Guys, > > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting") > changes calculation of 'io_ticks' a lot. > > In theory, io_ticks counts the time when there is any IO in-flight or in-queue, > so it has to rely on in-flight counting of IO. > > However, commit 5b18b5a73760 changes io_ticks's accounting into the > following way: > > stamp = READ_ONCE(part->stamp); > if (unlikely(stamp != now)) { > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) > __part_stat_add(part, io_ticks, 1); > } > > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp > changes compared with previous stamp, no matter if there is any in-flight > IO or not. > > Now when there is very heavy IO on disks, %util is still much less than > 100%, especially on HDD, the reason could be that IO latency can be much more > than 1ms in case of 1000HZ, so the above calculation is very inaccurate. > > Another extreme example is that if IOs take long time to complete, such > as IO stall, %util may show 0% utilization, instead of 100%. Hi Ming, Your email triggered a memory of someone else (Konstantin Khlebnikov) having reported and fixed this relatively recently, please see this patchset: https://lkml.org/lkml/2020/3/2/336 Obviously this needs fixing. If you have time to review/polish the proposed patches that'd be great. Mike