Re: testing io.low limit for blk-throttle

Joseph Qi <jiangqi903@xxxxxxxxx> · Wed, 25 Apr 2018 20:13:51 +0800

Hi Paolo,

On 18/4/24 20:12, Paolo Valente wrote:
> 
> 
>> Il giorno 23 apr 2018, alle ore 11:01, Joseph Qi <jiangqi903@xxxxxxxxx> ha scritto:
>>
>>
>>
>> On 18/4/23 15:35, Paolo Valente wrote:
>>>
>>>
>>>> Il giorno 23 apr 2018, alle ore 08:05, Joseph Qi <jiangqi903@xxxxxxxxx> ha scritto:
>>>>
>>>> Hi Paolo,
>>>
>>> Hi Joseph,
>>> thanks for chiming in.
>>>
>>>> What's your idle and latency config?
>>>
>>> I didn't set them at all, as the only (explicit) requirement in my
>>> basic test is that one of the group is guaranteed a minimum bps.
>>>
>>>
>>>> IMO, io.low will allow others run more bandwidth if cgroup's average
>>>> idle time is high or latency is low.
>>>
>>> What you say here makes me think that I simply misunderstood the
>>> purpose of io.low.  So, here is my problem/question: "I only need to
>>> guarantee at least a minimum bandwidth, in bps, to a group.  Is the
>>> io.low limit the way to go?"
>>>
>>> I know that I can use just io.max (unless I misunderstood the goal of
>>> io.max too :( ), but my extra purpose would be to not waste bandwidth
>>> when some group is idle.  Yet, as for now, io.low is not working even
>>> for the first, simpler goal, i.e., guaranteeing a minimum bandwidth to
>>> one group when all groups are active.
>>>
>>> Am I getting something wrong?
>>>
>>> Otherwise, if there are some special values for idle and latency
>>> parameters that would make throttle work for my test, I'll be of
>>> course happy to try them.
>>>
>> I think you can try idle time with 1000us for all cgroups, and latency
>> target 100us for cgroup with low limit 100MB/s and 2000us for cgroups
>> with low limit 10MB/s. That means cgroup with low latency target will
>> be preferred.
>> BTW, from my expeierence the parameters are not easy to set because
>> they are strongly correlated to the cgroup IO behavior.
>>
> 
> +Tejun (I guess he might be interested in the results below)
> 
> Hi Joseph,
> thanks for chiming in. Your suggestion did work!
> 
> At first, I thought I had also understood the use of latency from the
> outcome of your suggestion: "want low limit really guaranteed for a
> group?  set target latency to a low value for it." But then, as a
> crosscheck, I repeated the same exact test, but reversing target
> latencies: I gave 2000 to the interfered (the group with 100MB/s
> limit) and 100 to the interferers.  And the interfered still got more
> than 100MB/s!  So I exaggerated: 20000 to the interfered.
> Same outcome :(
> 
> I tried really many other combinations, to try to figure this out, but
> results seemed more or less random w.r.t. to latency values.  I
> didn't even start to test different values for idle.
> 
> So, the only sound lesson that I seem to have learned is: if I want
> low limits to be enforced, I have to set target latency and idle
> explicitly.  The actual values of latencies matter little, or not at
> all. At least this holds for my simple tests.
> 
> At any rate, thanks to your help, Joseph, I could move to the most
> interesting part for me: how effective is blk-throttle with low
> limits?  I could well be wrong again, but my results do not seem that
> good.  With the simplest type of non-toy example I considered, I
> recorded throughput losses, apparently caused mainly by blk-throttle,
> and ranging from 64% to 75%.
> 
> Here is a worst-case example.  For each step, I'm reporting below the
> command by which you can reproduce that step with the
> thr-lat-with-interference benchmark of the S suite [1].  I just split
> bandwidth equally among five groups, on my SSD.  The device showed a
> peak rate of ~515MB/s in this test, so I set rpbs to 100MB/s for each
> group (and tried various values, and combinations of values, for the
> target latency, without any effect on the results).  To begin, I made
> every group do sequential reads.  Everything worked perfectly fine.
> 
> But then I made one group do random I/O [2], and troubles began.  Even
> if the group doing random I/O was given a target latency of 100usec
> (or lower), while the other had a target latency of 2000usec, the poor
> random-I/O group got only 4.7 MB/s!  (A single process doing 4k sync
> random I/O reaches 25MB/s on my SSD.)
> 
> I guess things broke because low limits did not comply any longer with
> the lower speed that device reached with the new, mixed workload: the
> device reached 376MB/s, while the sum of the low limits was 500MB/s.
> BTW the 'fault' for this loss of throughput was not only of the device
> and the workload: if I switched throttling off, then the device still
> reached its peak rate, although granting only 1.3MB/s to the
> random-I/O group.
> 
> So, to comply with the 376MB/s, I lowered the low limits to 74MB/s per
> group (to avoid a too tight 75MB/s) [3].  A little better: the
> random-I/O group got 7.2 MB/s.  But the total throughput went down
> further, to 289MB/s, and became again lower than the sum of the low
> limits.  Most certainly, this time the throughput went down mainly
> because blk-throttling was serving the random I/O more than before.
> 
> To make a long story short, I arrived to setting just 12MB/s as low
> limit for each group [4].  The random-I/O group was finally happy,
> with a revitalizing 12.77MB/s.  But the total throughput dropped down
> to 127MB/s, i.e., ~25% of the peak rate of the device.  Now the
> 'fault' for the throughput loss seemed undoubtedly of blk-throttle.
> The latter was evidently over-throttling some group.
> 
> To sum up, for my device, 12MB/s seems to be the highest value for
> which low limits can be guaranteed.  But setting these limits entails
> a high cost: if just one group really does random I/O, then 75% of the
> throughput is lost.
> 
> There would be other issues too.  For example, 12MB/s might be too
> little for the needs of some group in some time period.  This fact would
> make it extremely difficult, if ever possible, to set low limits that
> comply with the needs of more dynamic (and probably more
> realistic) workloads than the above one.
> 
Could you run blktrace as well when testing your case? There are several
throtl traces to help analyze whether it is caused by frequently
upgrade/downgrade.
If all cgroups are just running under low, I'am afraid the case you
tested has something to do with how SSD handle mixed workload IOs.

Thanks,
Joseph

> I think this is all, sorry for the long mail, I tried to shrink it as
> much as possible.  Looking forward to some feedback.
> 
> Thanks,
> Paolo
> 
> [1] https://github.com/Algodev-github/S
> [2] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 100M -W 100M -t randread -L 2000
> [3] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 74M -W 74M -t randread -L 2000
> [4] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 12M -W 12M -t randread -L 2000
>