On Thu, May 19, 2022 at 08:14:28PM +0800, "yukuai (C)" <yukuai3@xxxxxxxxxx> wrote: > tg_with_in_bps_limit: > jiffy_elapsed_rnd = jiffies - tg->slice_start[rw]; > tmp = bps_limit * jiffy_elapsed_rnd; > do_div(tmp, HZ); > bytes_allowed = tmp; -> how many bytes are allowed in this slice, > incluing dispatched. > if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) > *wait = 0 -> no need to wait if this bio is within limit > > extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; > -> extra_bytes is based on 'bytes_disp' > > For example: > > 1) bps_limit is 2k, we issue two io, (1k and 9k) > 2) the first io(1k) will be dispatched, bytes_disp = 1k, slice_start = 0 > the second io(9k) is waiting for (9 - (2 - 1)) / 2 = 4 s The 2nd io arrived at 1s, the wait time is 4s, i.e. it can be dispatched at 5s (i.e. 10k/*2kB/s = 5s). > 3) after 3 s, we update bps_limit to 1k, then new waiting is caculated: > > without this patch: bytes_disp = 0, slict_start =3: > bytes_allowed = 1k <--- why 1k and not 0? > extra_bytes = 9k - 1k = 8k > wait = 8s This looks like it was calculated at time 4s (1s after new config was set). > > whth this patch: bytes_disp = 0.5k, slice_start = 0, > bytes_allowed = 1k * 3 + 1k = 4k > extra_bytes = 0.5k + 9k - 4k = 5.5k > wait = 5.5s This looks like calculated at 4s, so the IO would be waiting till 4s+5.5s = 9.5s. As I don't know why using time 4s, I'll shift this calculation to the time 3s (when the config changes): bytes_disp = 0.5k, slice_start = 0, bytes_allowed = 1k * 3 = 3k extra_bytes = 0.5k + 9k - 3k = 7.5k wait = 7.5s In absolute time, the IO would wait till 3s+7.5s = 10.5s OK, either your 9.5s or my 10.5s looks weird (although earlier than original 4s+8s=12s). However, the IO should ideally only wait till 3s + (9k - (6k - 1k) ) / 1k/s = bio - (allowed - dispatched) / new_limit =3s + 4k / 1k/s = 7s ('allowed' is based on old limit) Or in another example, what if you change the config from 2k/s to ∞k/s (unlimited, let's neglect the arithmetic overflow that you handle explicitly, imagine a big number but not so big to be greater than division result). In such a case, the wait time should be zero, i.e. IO should be dispatched right at the time of config change. (With your patch that still calculates >0 wait time (and the original behavior gives >0 wait too.) > I hope I can expliain it clearly... Yes, thanks for pointing me to relevant parts. I hope I grasped them correctly. IOW, your patch and formula make the wait time shorter but still IO can be delayed indefinitely if you pass a sequence of new configs. (AFAIU) Regards, Michal