Hi, Vivek, Thanks for your careful review. I'll rename t_c to last_dispatch, it's much better. For the big burst issue, I've different opinion. Let's discuss it. Any time a big IO means a big burst. Even if it's throttled at first time, queued in the service_queue, and then waited for a while, when it's delivered out, it IS still a big burst. So throttling the bio for another while makes no sense. If a group has been idle for 5 minutes, then it owns the credit to deliver a big IO (within 300 * bps bytes). And the extra credit will be cut off after the delivery. Waiting for your comments. Zhiguo On Wed, Oct 16, 2013 at 1:32 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > On Mon, Oct 14, 2013 at 05:09:17PM +0800, Hong Zhiguo wrote: > > [..] > > Hi Hong, > > Thanks for the token based throttling implementation. In general it looks > good and it simplies the logic. Couple of comments/concerns below. > >> @@ -133,14 +135,13 @@ struct throtl_grp { >> /* IOPS limits */ >> unsigned int iops[2]; >> >> - /* Number of bytes disptached in current slice */ >> - uint64_t bytes_disp[2]; >> - /* Number of bio's dispatched in current slice */ >> - unsigned int io_disp[2]; >> + /* Token for throttling by bps */ >> + uint64_t bytes_token[2]; >> + /* Token for throttling by iops */ >> + unsigned int io_token[2]; >> >> - /* When did we start a new slice */ >> - unsigned long slice_start[2]; >> - unsigned long slice_end[2]; >> + /* Time check-point */ >> + unsigned long t_c[2]; > > Can we rename this variable to say last_dispatch[2]. > > [..] >> >> @@ -852,41 +708,26 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, >> unsigned long *wait) >> { >> bool rw = bio_data_dir(bio); >> - u64 bytes_allowed, extra_bytes, tmp; >> - unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; >> - >> - jiffy_elapsed = jiffy_elapsed_rnd = jiffies - tg->slice_start[rw]; >> - >> - /* Slice has just started. Consider one slice interval */ >> - if (!jiffy_elapsed) >> - jiffy_elapsed_rnd = throtl_slice; >> - >> - jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, throtl_slice); >> + u64 extra_bytes, token; >> + unsigned long jiffy_wait; >> >> - tmp = tg->bps[rw] * jiffy_elapsed_rnd; >> - do_div(tmp, HZ); >> - bytes_allowed = tmp; >> + token = (u64)tg->bps[rw] * (jiffies - tg->t_c[rw]); > > So here we seem to be calculating how many tokens a group is eligible > for. This is done based on since last time check. And last time check > was updated in last dispatch from the group. > > What happens if no IO happens in a group for some time, say for 5 minutes, > and then one big IO comes in. IIUC, we will allow a big burst of IO for > the first IO (tokens worth of full 5 minutes will be given to this group). > I think we should have a mechanism where we don't allow too big a burst > for the first IO. (Possibly limit the burst to THROTL_BURST_IO and > THROTL_BURST_BYTES until and unless a bigger IO is already queued and > we are waiting for it). > > In previous time slice logic, I took care of it by extend time slice > by 100ms and if no IO happens in the group for 100ms, time slice will > expire and next IO will get at max of 100ms of worth of credit (equivalent > of burst limits). > > Thanks > Vivek -- best regards Hong Zhiguo -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html