Hi, >> Test scrips: >> cgpath=/sys/fs/cgroup/blkio/test0 >> mkdir -p $cgpath >> echo "8:0 10485760" > $cgpath/blkio.throttle.write_bps_device >> echo "8:16 100000" > $cgpath/blkio.throttle.write_iops_device > What? 8:0 and 8:16? My bad, I made a typos here. It should be all 8:0 or 8:16. What I want to do here was to set an easy to reach value to BPS_LIMIT (10M/s in this example) and an unable to reach value to IOPS_LIMIT (100000 in this example). Under this setting, the iostat shows that the bps is far less than 10M/s and sometimes is far larger than 10M/s. Once I cancel the IOPS_LIMIT, i.e., echo 0 to the write_iops_device, the bps stabilizes at around 10M/s. The root cause of this is that the split bio will be throttled again even though a tg is under the IOPS_LIMIT when the tg's sq->queue is not empty. Let me explain it with the code. If we only set BPS_LIMIT and the bio is flagged with BIO_BPS_THROTTLED, the blk_should_throtl() will always return false and the __blk_throtl_bio() will not be called. So, the bio flagged with BIO_BPS_THROTTLED will not be throttled in any cases. However, if we set both BPS_LIMIT and IOPS_LIMIT, the blk_should_throtl() will always return true no matter what the value the IOPS_LIMIT is because tg->has_rules_iops[rw] is always true. After that, the bio will be passed to __blk_throtl_bio(). If the tg's sq->queue is empty, blkthrottle will calculate if the tg is above limit with the bio. Since the bio is flagged with BIO_BPS_THROTTLED, blkthrottle will only calculate the IOPS limit for the tg. If tg is under iops limit with the bio, the bio will not be throttled. Otherwise, the bio will be throttled because of the iops limit. However, If the tg's sq->queue is not empty, the bio will be throttled directly without any iops or bps limitation calculations. The related code snippet is : bool __blk_throtl_bio(struct bio *bio) { ....... while (true) { ...... /* throtl is FIFO - if bios are already queued, should queue */ if (sq->nr_queued[rw]) break; // -----> the bio will be throttled in any cases /* if above limits, break to queue */ if (!tg_may_dispatch(tg, bio, NULL)) { // ----> do iops and bps calculations break; // ---> the bio will be throttled if the tg is above bps or iops limits ...... if (!tg) { bio_set_flag(bio, BIO_BPS_THROTTLED); goto out_unlock; // ----> pass, no throttle } ....... } } So even the bio is flagged with BIO_BPS_THROTTLED and the tg is far more behind IOPS_LIMIT, the bio will be throttled if the sq->queue is not empty. This problem can be reproduced by running following scripts and comparing the outputs of iostat. 1) PNUM=50 # large enough to saturate the sq->queue cgpath=/sys/fs/cgroup/blkio/test0 mkdir -p $cgpath echo "8:0 10485760" > $cgpath/blkio.throttle.write_bps_device echo "8:0 100000" > $cgpath/blkio.throttle.write_iops_device # large enough to make it unreachable for ((i=0;i<;$PUM; i++));do fio -rw=write -direct=1 -bs=4M -iodepth=8 -size=200M -numjobs=1 \ -time_based=1 -runtime=30 -name=testt_$i -filename=testf_$i > /dev/null & echo $! > $cgpath/tasks done 2) PNUM=50 # large enough to saturate the sq->queue cgpath=/sys/fs/cgroup/blkio/test0 mkdir -p $cgpath echo "8:0 10485760" > $cgpath/blkio.throttle.write_bps_device echo "8:0 0" > $cgpath/blkio.throttle.write_iops_device # do not set iops limit for ((i=0;i<;$PUM; i++));do fio -rw=write -direct=1 -bs=4M -iodepth=8 -size=200M -numjobs=1 \ -time_based=1 -runtime=30 -name=testt_$i -filename=testf_$i > /dev/null & echo $! > $cgpath/tasks done >> Signed-off-by: zhoutaiyu <zhoutaiyu@xxxxxxxxxxxx> >> --- >> block/blk-throttle.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/block/blk-throttle.c b/block/blk-throttle.c >> index f4850a6..499c006 100644 >> --- a/block/blk-throttle.c >> +++ b/block/blk-throttle.c >> @@ -913,7 +913,8 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, >> * queued. >> */ >> BUG_ON(tg->service_queue.nr_queued[rw] && >> - bio != throtl_peek_queued(&tg->service_queue.queued[rw])); >> + bio != throtl_peek_queued(&tg->service_queue.queued[rw]) && >> + !bio_flagged(bio, BIO_BPS_THROTTLED)); >> >> /* If tg->bps = -1, then BW is unlimited */ >> if ((bps_limit == U64_MAX && iops_limit == UINT_MAX) || >> @@ -2201,7 +2202,7 @@ bool __blk_throtl_bio(struct bio *bio) >> throtl_downgrade_check(tg); >> throtl_upgrade_check(tg); >> /* throtl is FIFO - if bios are already queued, should queue */ >> - if (sq->nr_queued[rw]) >> + if (sq->nr_queued[rw] && !bio_flagged(bio, BIO_BPS_THROTTLED)) > No, this change is wrong. Split IO will not be throttled by iops limit anymore. After this change, the split IO will be throttled by iops limit again if it reaches a tg's iops limit and will not be throttled in any cases if the sq->queue is not empty.