On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote: > On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov <khazhy@xxxxxxxxxx> wrote: > > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: > >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote: > >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: > >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote: > >>> >> Allows configuration additional bytes or ios before a throttle is > >>> >> triggered. > >>> >> > >>> >> This allows implementation of a bucket style rate-limit/throttle on a > >>> >> block device. Previously, bursting to a device was limited to allowance > >>> >> granted in a single throtl_slice (similar to a bucket with limit N and > >>> >> refill rate N/slice). > >>> >> > >>> >> Additional parameters bytes/io_burst_conf defined for tg, which define a > >>> >> number of bytes/ios that must be depleted before throttling happens. A > >>> >> tg that does not deplete this allowance functions as though it has no > >>> >> configured limits. tgs earn additional allowance at rate defined by > >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling > >>> >> kicks in. If a tg is idle for a while, it will again have some burst > >>> >> allowance before it gets throttled again. > >>> >> > >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0, > >>> >> when all "used" burst allowance would be earned back. trim_slice still > >>> >> does progress slice_start as before and decrements *_disp as before, and > >>> >> tgs continue to get bytes/ios in throtl_slice intervals. > >>> > > >>> > Can you describe why we need this? It would be great if you can describe the > >>> > usage model and an example. Does this work for io.low/io.max or both? > >>> > > >>> > Thanks, > >>> > Shaohua > >>> > > >>> > >>> Use case that brought this up was configuring limits for a remote > >>> shared device. Bursting beyond io.max is desired but only for so much > >>> before the limit kicks in, afterwards with sustained usage throughput > >>> is capped. (This proactively avoids remote-side limits). In that case > >>> one would configure in a root container io.max + io.burst, and > >>> configure low/other limits on descendants sharing the resource on the > >>> same node. > >>> > >>> With this patch, so long as tg has not dispatched more than the burst, > >>> no limit is applied at all by that tg, including limit imposed by > >>> io.low in tg_iops_limit, etc. > >> > >> I'd appreciate if you can give more details about the 'why'. 'configuring > >> limits for a remote shared device' doesn't justify the change. > > > > This is to configure a bursty workload (and associated device) with > > known/allowed expected burst size, but to not allow full utilization > > of the device for extended periods of time for QoS. During idle or low > > use periods the burst allowance accrues, and then tasks can burst well > > beyond the configured throttle up to the limit, afterwards is > > throttled. A constant throttle speed isn't sufficient for this as you > > can only burst 1 slice worth, but a limit of sorts is desirable for > > preventing over utilization of the shared device. This type of limit > > is also slightly different than what i understand io.low does in local > > cases in that tg is only high priority/unthrottled if it is bursty, > > and is limited with constant usage > > > > Khazhy > > Hi Shaohua, > > Does this clarify the reason for this patch? Is this (or something > similar) a good fit for inclusion in blk-throttle? > So does this brust have to be per cgroup. I mean if thortl_slice was configurable, that will allow to control the size of burst. (Just that it will be for all cgroups). If that works, that might be a simpler solution. Vivek