Hi, Ming!
在 2024/01/05 10:49, Ming Lei 写道:
On Wed, Jan 03, 2024 at 03:15:15PM +0800, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>
Currently, io_ticks is accounted based on sampling, specifically
update_io_ticks() will always account io_ticks by 1 jiffies from
bdev_start_io_acct()/blk_account_io_start(), and the result can be
inaccurate, for example(HZ is 250):
Test script:
fio -filename=/dev/sda -bs=4k -rw=write -direct=1 -name=test -thinktime=4ms
Test result: util is about 90%, while the disk is really idle.
In order to account io_ticks precisely, update_io_ticks() must know if
there are IO inflight already, and this requires overhead slightly,
hence precise io accounting is disabled by default, and user can enable
it through sysfs entry.
Noted that for rq-based devcie, part_stat_local_inc/dec() and
part_in_flight() is used to track inflight instead of iterating tags,
which is not supposed to be used in fast path because 'tags->lock' is
grabbed in blk_mq_find_and_get_req().
Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
Changes in v2:
- remove the new parameter for update_io_ticks();
- simplify update_io_ticks();
- use swith in queue_iostats_store();
- add missing part_stat_local_dec() in blk_account_io_merge_request()
Looks fine,
Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>
Thanks for the review, however, I made a mistake while "simplify
update_io_ticks()" that first IO will still account by 1 jiffies even if
precise iostat is enabled:
+ if (unlikely(time_after(now, stamp)) &&
+ likely(try_cmpxchg(&part->bd_stamp, &stamp, now))) {
+ if (end || (blk_queue_precise_io_stat(part->bd_queue) &&
+ part_in_flight(part)))
+ __part_stat_add(part, io_ticks, now - stamp);
+ else
-> here, should be else if (!blk_queue_precise_io_stat(part->bd_queue))
+ __part_stat_add(part, io_ticks, 1);
Alough this is RFC, my apologize for sending this version without fully
test the functionally. I'll send a formal version soon.
Thanks,
Kuai
thanks,
Ming
.