Re: Mixed seq+random worload thread

abhishek koundal <akoundal@xxxxxxxxx> · Fri, 14 Jul 2017 09:25:44 -0700

Hi,
We have run multiple instances with the StorScor and yes there is a
reason MSFT is doing it and mostly every big cloud vendor caps the
drive max throughput.
I am completely in alignment with your breakdown of the code and there
is no confusion there as that is how I started to model the behavior
in FIO.
They are surely capping the reads when they do this e.g. 10% SEQ_WR ,
90% RND_RD (mixed WL), you can see that your default b/w for 90% will
close enough to 100% PURE RND_RD but still not apple to apple.
Still following the same process using MSFT recipe, In storScor  I can
clearly see that 90%RND_RD b/w is not capped as per the capping of the
SEQ_WR throughput.
e.g. StorScor output.
perfmon_physicaldisk_1_d:_disk_reads/sec : 6,173.32
<-------------RND_RD is not capped but will surely get impacted as its
not pure RND_RD
perfmon_physicaldisk_1_d:_disk_writes/sec : 18.49
<---------------SEQ_WR is capped based on the %/100*rd_throughput.
Read Bytes % :91.25%
<-------------------------------------------------------Tells me ~90%
RND_RD
Write Bytes %: 8.75%
<----------------------------------------------------------Tells me
~10% SEQ_RD

In case of FIO theRND_RD also get impacted with the rate capping.

My simple question is : Is the above MSFT receipe to model the mixed
workload with controlling the throuput in FIO possible?  I tried what
i can figure from the documentation and your feedback and looks like
FIO has limitation in this front.

On Thu, Jul 13, 2017 at 11:20 PM, Sitsofe Wheeler <sitsofe@xxxxxxxxx> wrote:
> Hi,
>
> Over on https://github.com/Microsoft/StorScore/blob/master/recipes/write_impact_check.rcp
> the reads *must* be being capped otherwise what purpose does
> $read_tput_KBps have? See the following:
>
>         # Run the workload once undisturbed as a baseline.
>         my %baseline =
>             test( %workload, description => "$desc Baseline" );
>
>         my $read_tput_KBps = $baseline{'MB/sec Total'} * 1000;
> [...]
>
>         # Run again, this time with an aggressor process injecting
>         # writes in the background.  The aggressor is rate-limited
>         # to a fraction of the baseline read throughput.
>         foreach my $write_workload ( @write_workloads )
>         {
> [...]
>
>             my $write_tput_KBps =
>                 int ( ( $write_pct / 100 ) * $read_tput_KBps );
>
> The comment explains where the read "cap" came from. Briefly the
> StoreScore program seems to be doing this:
> 1. Run DISKSPD briefly to find the random read throughput of the disk
> and put this value into the prev_rand_read_tput variable.
> 2. seq_write_tput = prev_rand_read_tput * 10 / 100.
> 3. rand_read_tput = prev_rand_read_tput.
> 4. Run a detached DISKSPD at the same time as 4 that does sequential
> writes but with a throughput cap of seq_write_tput.
> 6. Run a detached DISKSPD that does random writes but with a
> throughput cap of rand_read_tput.
>
> This strikes me as different to what you were proposing:
> a) You said: "I want fio to do 10% 'this' / 90% 'that'"
> b) That StorScore program does: "100% of 'that' capped to what I've
> 'previously measured that' can do, and 'this' capped to 10% of the
> 'previously measured that'".
>
> "Previously measured" becomes fixed and won't be changing as the
> subsequent DISKSPD processes are running. Further, the ratio in a)
> isn't between the same things as the ratio in b) and the even the
> ratio itself isn't the same!
>
> For example, in the StorScore program imagine the baseline is
> 100MBytes/s. You start running 4. and 5. together and the DISKSPD
> process at 4. immediately falls to 0 (e.g. the disk is weird and can't
> do reads and writes simultaneously). That does not mean the step 5.
> DISKSPD will limit its throughput to 0 and continues chugging away at
> 10MBytes/s. Thus the ratio between the step 4 DISKSPD and the step 5
> DISKSPD quickly becomes 10:0.
>
> You would have to talk to Microsoft as to why their StoreScore recipe
> was written the way it is but the comments do suggest it was
> intentional. It's worth noting DISKSPD itself can't balance entire
> running "jobs" against each other but as previously mentioned that's
> not what the StorScore program you linked is trying to do anyway...
>
> On 13 July 2017 at 21:55, abhishek koundal <akoundal@xxxxxxxxx> wrote:
>> Yes that is correct.
>> So e.g. if you have RND_RD  b/w 45MBps and had 10%SEQ_WR 90%RND_RD the
>> write throughput will be:
>> =10/100 *45MBs = ~4.5MB/s
>>
>> Now if you try to cap the write throughput (rate =5m) to this in
>> Linux, the RND_RD also gets impacted and wont run to its full
>> potential.
>>
>> In StorScor reads thread will not get capped and only writes thread
>> will be capped, hence deliver the expected results.
>>
>> On Thu, Jul 13, 2017 at 1:33 PM, Sitsofe Wheeler <sitsofe@xxxxxxxxx> wrote:
>>> Hey,
>>>
>>> My Perl is a bit rusty but look at how the write throughput is calculated:
>>>
>>>         # Run again, this time with an aggressor process injecting
>>>         # writes in the background.  The aggressor is rate-limited
>>>         # to a fraction of the baseline read throughput.
>>>         foreach my $write_workload ( @write_workloads )
>>>         {
>>>             my ($write_pattern, $write_pct) = @$write_workload;
>>>
>>>             my $write_tput_KBps =
>>>                 int ( ( $write_pct / 100 ) * $read_tput_KBps );
>>>
>>> So it is accounting for the fact that the write throughput will be
>>> limited to a percentage of the read throughput.
>>>
>>> On 13 July 2017 at 20:02, abhishek koundal <akoundal@xxxxxxxxx> wrote:
>>>> Hi,
>>>> The main thing that i am trying to do is to model the storScor in
>>>> linux enviornment
>>>> (https://github.com/Microsoft/StorScore/blob/master/recipes/write_impact_check.rcp).
>>>> So there you can control the rate of the of the writes as per the %
>>>> that is required. While running that on windows i can see that i can
>>>> have different BS and rate defined for the runs achieving expected
>>>> output.
>>>>
>>>> In the above mentioned case the RND_RD will be unconstrained (i.e.
>>>> full BW they can achieve) but writes will be slowed to x% of RND_RD.
>>>> In fio when
>>>> e.g.
>>>> SEQ_WR 10% and RND_RD 90%, with seq_wr rate capped at 10% of rnd rd
>>>> (if rnd_rd was 40m)
>>>> [SEQ_WR]
>>>> rate=4m
>>>> bs=128K
>>>> rw=write
>>>> flow=9
>>>> [RND_RD]
>>>> rate=40m
>>>> bs=4k
>>>> rw=randread
>>>> flow=-1
>>>> When i run this the output that comes back shows that SEQ_WR b/w is
>>>> capped ~ 4m but RND_RD also gets severely hit for the IOPS+BW as it
>>>> tried to put the total BW under <45m>. I want that RND_RD doesn't need
>>>> to be capped and should run to it closest potential like I am see in
>>>> storScor.
>>>> Really appreciate for the support and helping me understand the limitations.
>>>
>>> --
>>> Sitsofe | http://sucs.org/~sits/
>
> --
> Sitsofe | http://sucs.org/~sits/

-- 
Life is too short for silly things so invest your time in some
productive outputs!!
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html