Re: Mixed seq+random worload thread

Sitsofe Wheeler <sitsofe@xxxxxxxxx> · Fri, 14 Jul 2017 19:30:20 +0100

Hi,

On 14 July 2017 at 17:25, abhishek koundal <akoundal@xxxxxxxxx> wrote:
> Hi,
> We have run multiple instances with the StorScor and yes there is a
> reason MSFT is doing it and mostly every big cloud vendor caps the
> drive max throughput.
> I am completely in alignment with your breakdown of the code and there
> is no confusion there as that is how I started to model the behavior
> in FIO.
> They are surely capping the reads when they do this e.g. 10% SEQ_WR ,
> 90% RND_RD (mixed WL), you can see that your default b/w for 90% will
> close enough to 100% PURE RND_RD but still not apple to apple.
> Still following the same process using MSFT recipe, In storScor  I can
> clearly see that 90%RND_RD b/w is not capped as per the capping of the
> SEQ_WR throughput.
> e.g. StorScor output.
> perfmon_physicaldisk_1_d:_disk_reads/sec : 6,173.32
> <-------------RND_RD is not capped but will surely get impacted as its
> not pure RND_RD
> perfmon_physicaldisk_1_d:_disk_writes/sec : 18.49
> <---------------SEQ_WR is capped based on the %/100*rd_throughput.
> Read Bytes % :91.25%
> <-------------------------------------------------------Tells me ~90%
> RND_RD
> Write Bytes %: 8.75%
> <----------------------------------------------------------Tells me
> ~10% SEQ_RD

I disagree that the StoreScore recipe will alway give you that split -
there's nothing in it that will cap the reads to 90% of all possible
I/Os if the writes turn out to do less than 10% of the I/Os. That's
the crux of my complaint and why I keep saying "you've asked for a
10/90 split between SEQ_WR and RND_RD but that's not what the
StoreScore recipe does". I think you have to let go of the idea that
the jobs are in any way balanced against each other even if it often
looks like they are. Of course it's up to you as to whether you feel
that's close enough for your purposes :-)

> In case of FIO theRND_RD also get impacted with the rate capping.

Yes, because fio really is maintaining the ratio while both jobs
*while they are running* - if one side slows down the other will slow
down too so the ratio continues to be maintained.

> My simple question is : Is the above MSFT receipe to model the mixed
> workload with controlling the throuput in FIO possible?  I tried what
> i can figure from the documentation and your feedback and looks like
> FIO has limitation in this front.

Sure but you have to program it exactly the same way the recipe did:
i.e. write a program than runs the only the read part using fio, grabs
the bandwidth achieved and then does the same StoreScore recipe
calculations to set the rate of the each of the fio jobs independently
e.g.:

Run fio job that does
[RND_RD]
bs=4k
rw=randread

Grab the bandwidth that was achieved (e.g. by parsing the json
output). Let's say I find that my disk had a read throughput of
100MBytes/s:

10/100*100=10

OK I now have my program generate an fio job file (or command line
that does this):

[SEQ_WR]
rate=10M
bs=128K
rw=write
[RND_RD]
rate=100M
bs=4k
rw=randread

There's no flow and I've calculated the rate caps the same way the
StoreScore recipe does. I have to do this because the recipe is NOT
actively preserving a ratio between SEQ_WR and RND_RD. The ratio it is
doing is SEQ_WR against the *original* read throughput, RND_RD against
the *original* read throughput.

I'm afraid I don't know a better way to describe my thoughts on this
but perhaps someone else has some insight?

> On Thu, Jul 13, 2017 at 11:20 PM, Sitsofe Wheeler <sitsofe@xxxxxxxxx> wrote:
>> Hi,
>>
>> Over on https://github.com/Microsoft/StorScore/blob/master/recipes/write_impact_check.rcp
>> the reads *must* be being capped otherwise what purpose does
>> $read_tput_KBps have? See the following:
>>
>>         # Run the workload once undisturbed as a baseline.
>>         my %baseline =
>>             test( %workload, description => "$desc Baseline" );
>>
>>         my $read_tput_KBps = $baseline{'MB/sec Total'} * 1000;
>> [...]
>>
>>         # Run again, this time with an aggressor process injecting
>>         # writes in the background.  The aggressor is rate-limited
>>         # to a fraction of the baseline read throughput.
>>         foreach my $write_workload ( @write_workloads )
>>         {
>> [...]
>>
>>             my $write_tput_KBps =
>>                 int ( ( $write_pct / 100 ) * $read_tput_KBps );
>>
>> The comment explains where the read "cap" came from. Briefly the
>> StoreScore program seems to be doing this:
>> 1. Run DISKSPD briefly to find the random read throughput of the disk
>> and put this value into the prev_rand_read_tput variable.
>> 2. seq_write_tput = prev_rand_read_tput * 10 / 100.
>> 3. rand_read_tput = prev_rand_read_tput.
>> 4. Run a detached DISKSPD at the same time as 4 that does sequential
>> writes but with a throughput cap of seq_write_tput.
>> 6. Run a detached DISKSPD that does random writes but with a
>> throughput cap of rand_read_tput.
>>
>> This strikes me as different to what you were proposing:
>> a) You said: "I want fio to do 10% 'this' / 90% 'that'"
>> b) That StorScore program does: "100% of 'that' capped to what I've
>> 'previously measured that' can do, and 'this' capped to 10% of the
>> 'previously measured that'".
>>
>> "Previously measured" becomes fixed and won't be changing as the
>> subsequent DISKSPD processes are running. Further, the ratio in a)
>> isn't between the same things as the ratio in b) and the even the
>> ratio itself isn't the same!
>>
>> For example, in the StorScore program imagine the baseline is
>> 100MBytes/s. You start running 4. and 5. together and the DISKSPD
>> process at 4. immediately falls to 0 (e.g. the disk is weird and can't
>> do reads and writes simultaneously). That does not mean the step 5.
>> DISKSPD will limit its throughput to 0 and continues chugging away at
>> 10MBytes/s. Thus the ratio between the step 4 DISKSPD and the step 5
>> DISKSPD quickly becomes 10:0.
>>
>> You would have to talk to Microsoft as to why their StoreScore recipe
>> was written the way it is but the comments do suggest it was
>> intentional. It's worth noting DISKSPD itself can't balance entire
>> running "jobs" against each other but as previously mentioned that's
>> not what the StorScore program you linked is trying to do anyway...
>>
>> On 13 July 2017 at 21:55, abhishek koundal <akoundal@xxxxxxxxx> wrote:
>>> Yes that is correct.
>>> So e.g. if you have RND_RD  b/w 45MBps and had 10%SEQ_WR 90%RND_RD the
>>> write throughput will be:
>>> =10/100 *45MBs = ~4.5MB/s
>>>
>>> Now if you try to cap the write throughput (rate =5m) to this in
>>> Linux, the RND_RD also gets impacted and wont run to its full
>>> potential.
>>>
>>> In StorScor reads thread will not get capped and only writes thread
>>> will be capped, hence deliver the expected results.
>>>
>>> On Thu, Jul 13, 2017 at 1:33 PM, Sitsofe Wheeler <sitsofe@xxxxxxxxx> wrote:
>>>> Hey,
>>>>
>>>> My Perl is a bit rusty but look at how the write throughput is calculated:
>>>>
>>>>         # Run again, this time with an aggressor process injecting
>>>>         # writes in the background.  The aggressor is rate-limited
>>>>         # to a fraction of the baseline read throughput.
>>>>         foreach my $write_workload ( @write_workloads )
>>>>         {
>>>>             my ($write_pattern, $write_pct) = @$write_workload;
>>>>
>>>>             my $write_tput_KBps =
>>>>                 int ( ( $write_pct / 100 ) * $read_tput_KBps );
>>>>
>>>> So it is accounting for the fact that the write throughput will be
>>>> limited to a percentage of the read throughput.
>>>>
>>>> On 13 July 2017 at 20:02, abhishek koundal <akoundal@xxxxxxxxx> wrote:
>>>>> Hi,
>>>>> The main thing that i am trying to do is to model the storScor in
>>>>> linux enviornment
>>>>> (https://github.com/Microsoft/StorScore/blob/master/recipes/write_impact_check.rcp).
>>>>> So there you can control the rate of the of the writes as per the %
>>>>> that is required. While running that on windows i can see that i can
>>>>> have different BS and rate defined for the runs achieving expected
>>>>> output.
>>>>>
>>>>> In the above mentioned case the RND_RD will be unconstrained (i.e.
>>>>> full BW they can achieve) but writes will be slowed to x% of RND_RD.
>>>>> In fio when
>>>>> e.g.
>>>>> SEQ_WR 10% and RND_RD 90%, with seq_wr rate capped at 10% of rnd rd
>>>>> (if rnd_rd was 40m)
>>>>> [SEQ_WR]
>>>>> rate=4m
>>>>> bs=128K
>>>>> rw=write
>>>>> flow=9
>>>>> [RND_RD]
>>>>> rate=40m
>>>>> bs=4k
>>>>> rw=randread
>>>>> flow=-1
>>>>> When i run this the output that comes back shows that SEQ_WR b/w is
>>>>> capped ~ 4m but RND_RD also gets severely hit for the IOPS+BW as it
>>>>> tried to put the total BW under <45m>. I want that RND_RD doesn't need
>>>>> to be capped and should run to it closest potential like I am see in
>>>>> storScor.
>>>>> Really appreciate for the support and helping me understand the limitations.

-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html