Re: [RFC PATCH] blk-throttle: add burst allowance.

Khazhismel Kumykov <khazhy@xxxxxxxxxx> · Mon, 18 Dec 2017 10:16:02 -0800

On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov <khazhy@xxxxxxxxxx> wrote:
> On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li <shli@xxxxxxxxxx> wrote:
>> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li <shli@xxxxxxxxxx> wrote:
>>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>>> >> Allows configuration additional bytes or ios before a throttle is
>>> >> triggered.
>>> >>
>>> >> This allows implementation of a bucket style rate-limit/throttle on a
>>> >> block device. Previously, bursting to a device was limited to allowance
>>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>>> >> refill rate N/slice).
>>> >>
>>> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
>>> >> number of bytes/ios that must be depleted before throttling happens. A
>>> >> tg that does not deplete this allowance functions as though it has no
>>> >> configured limits. tgs earn additional allowance at rate defined by
>>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>>> >> kicks in. If a tg is idle for a while, it will again have some burst
>>> >> allowance before it gets throttled again.
>>> >>
>>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>>> >> when all "used" burst allowance would be earned back. trim_slice still
>>> >> does progress slice_start as before and decrements *_disp as before, and
>>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>>> >
>>> > Can you describe why we need this? It would be great if you can describe the
>>> > usage model and an example. Does this work for io.low/io.max or both?
>>> >
>>> > Thanks,
>>> > Shaohua
>>> >
>>>
>>> Use case that brought this up was configuring limits for a remote
>>> shared device. Bursting beyond io.max is desired but only for so much
>>> before the limit kicks in, afterwards with sustained usage throughput
>>> is capped. (This proactively avoids remote-side limits). In that case
>>> one would configure in a root container io.max + io.burst, and
>>> configure low/other limits on descendants sharing the resource on the
>>> same node.
>>>
>>> With this patch, so long as tg has not dispatched more than the burst,
>>> no limit is applied at all by that tg, including limit imposed by
>>> io.low in tg_iops_limit, etc.
>>
>> I'd appreciate if you can give more details about the 'why'. 'configuring
>> limits for a remote shared device' doesn't justify the change.
>
> This is to configure a bursty workload (and associated device) with
> known/allowed expected burst size, but to not allow full utilization
> of the device for extended periods of time for QoS. During idle or low
> use periods the burst allowance accrues, and then tasks can burst well
> beyond the configured throttle up to the limit, afterwards is
> throttled. A constant throttle speed isn't sufficient for this as you
> can only burst 1 slice worth, but a limit of sorts is desirable for
> preventing over utilization of the shared device. This type of limit
> is also slightly different than what i understand io.low does in local
> cases in that tg is only high priority/unthrottled if it is bursty,
> and is limited with constant usage
>
> Khazhy

Hi Shaohua,

Does this clarify the reason for this patch? Is this (or something
similar) a good fit for inclusion in blk-throttle?

Thanks,
Khazhy
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature