On Wed, May 18, 2022 at 03:27:50PM +0800, Yu Kuai wrote: > commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") > introduce a new problem, for example: > > [root@localhost ~]# echo "8:0 1024" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device > [root@localhost ~]# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs > [root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & > [1] 620 > [root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & > [2] 626 > [root@localhost ~]# 1+0 records in > 1+0 records out > 10240 bytes (10 kB, 10 KiB) copied, 10.0038 s, 1.0 kB/s1+0 records in > 1+0 records out > > 10240 bytes (10 kB, 10 KiB) copied, 9.23076 s, 1.1 kB/s > -> the second bio is issued after 10s instead of 20s. > > This is because if some bios are already queued, current bio is queued > directly and the flag 'BIO_THROTTLED' is set. And later, when former > bios are dispatched, this bio will be dispatched without waiting at all, > this is due to tg_with_in_bps_limit() return 0 for this bio. > > In order to fix the problem, don't skip flaged bio in > tg_with_in_bps_limit(), and for the problem that split bio can be > double accounted, compensate the over-accounting in __blk_throtl_bio(). > > Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > --- > block/blk-throttle.c | 24 ++++++++++++++++++------ > 1 file changed, 18 insertions(+), 6 deletions(-) > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 447e1b8722f7..6f69859eae23 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, > unsigned int bio_size = throtl_bio_data_size(bio); > > /* no need to throttle if this bio's bytes have been accounted */ > - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { > + if (bps_limit == U64_MAX) { > if (wait) > *wait = 0; > return true; > @@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) > unsigned int bio_size = throtl_bio_data_size(bio); > > /* Charge the bio to the group */ > - if (!bio_flagged(bio, BIO_THROTTLED)) { > - tg->bytes_disp[rw] += bio_size; > - tg->last_bytes_disp[rw] += bio_size; > - } > - > + tg->bytes_disp[rw] += bio_size; > + tg->last_bytes_disp[rw] += bio_size; > tg->io_disp[rw]++; > tg->last_io_disp[rw]++; > > @@ -2121,6 +2118,21 @@ bool __blk_throtl_bio(struct bio *bio) > tg->last_low_overflow_time[rw] = jiffies; > throtl_downgrade_check(tg); > throtl_upgrade_check(tg); > + > + /* > + * re-entered bio has accounted bytes already, so try to > + * compensate previous over-accounting. However, if new > + * slice is started, just forget it. > + */ > + if (bio_flagged(bio, BIO_THROTTLED)) { > + unsigned int bio_size = throtl_bio_data_size(bio); > + > + if (tg->bytes_disp[rw] >= bio_size) > + tg->bytes_disp[rw] -= bio_size; > + if (tg->last_bytes_disp[rw] > bio_size) > + tg->last_bytes_disp[rw] -= bio_size; The above check should be: if (tg->last_bytes_disp[rw] >= bio_size) Otherwise, this patch looks fine for me. Thanks, Ming