Doing high IOPS testing with blk-cgroups enabled spends ~15-20% of the time just doing ktime_get_ns() -> readtsc. We essentially read and set the start time twice, one for the bio and then again when that bio is mapped to a request. Given that the time between the two is very short, inherit the bio start time instead of reading it again. This cuts 1/3rd of the overhead of the time keeping. Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> --- block/blk-mq.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index a8c437afc2c3..a40c94505680 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -718,7 +718,14 @@ void blk_mq_start_request(struct request *rq) trace_block_rq_issue(rq); if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) { - rq->io_start_time_ns = ktime_get_ns(); + u64 start_time; +#ifdef CONFIG_BLK_CGROUP + if (rq->bio) + start_time = bio_issue_time(&rq->bio->bi_issue); + else +#endif + start_time = ktime_get_ns(); + rq->io_start_time_ns = start_time; rq->stats_sectors = blk_rq_sectors(rq); rq->rq_flags |= RQF_STATS; rq_qos_issue(q, rq); -- 2.33.0