Re: Mixed seq+random worload thread

Mikhail Terekhov <Mikhail.Terekhov@xxxxxxxx> · Fri, 14 Jul 2017 14:20:22 -0400

Hi,
IMHO you mix absolute and relative caps. Here is my take on what StorScor
does in this recipe:

1. Run undisturbed RND_RD test and determine $read_tput_KBps from it.

2. Calculate absolute cap for (RND|SEQ)_WR as 10% from the above:
       $write_tput_KBps = int ( ( $write_pct / 100 ) * $read_tput_KBps );

3. Run (RND|SEQ)_WR test in background with this _absolute_ cap

4. Run RND_RD test _without_ any cap

Please try the following fio job:

[write_impact_check]
rate=,4m
bs=4k,128K
percentage_random=100,0
rw=randrw

Regards, Mikhail

On 07/14/17 12:25, abhishek koundal wrote:
Hi,
We have run multiple instances with the StorScor and yes there is a
reason MSFT is doing it and mostly every big cloud vendor caps the
drive max throughput.
I am completely in alignment with your breakdown of the code and there
is no confusion there as that is how I started to model the behavior
in FIO.
They are surely capping the reads when they do this e.g. 10% SEQ_WR ,
90% RND_RD (mixed WL), you can see that your default b/w for 90% will
close enough to 100% PURE RND_RD but still not apple to apple.
Still following the same process using MSFT recipe, In storScor  I can
clearly see that 90%RND_RD b/w is not capped as per the capping of the
SEQ_WR throughput.
e.g. StorScor output.
perfmon_physicaldisk_1_d:_disk_reads/sec : 6,173.32
<-------------RND_RD is not capped but will surely get impacted as its
not pure RND_RD
perfmon_physicaldisk_1_d:_disk_writes/sec : 18.49
<---------------SEQ_WR is capped based on the %/100*rd_throughput.
Read Bytes % :91.25%
<-------------------------------------------------------Tells me ~90%
RND_RD
Write Bytes %: 8.75%
<----------------------------------------------------------Tells me
~10% SEQ_RD

In case of FIO theRND_RD also get impacted with the rate capping.

My simple question is : Is the above MSFT receipe to model the mixed
workload with controlling the throuput in FIO possible?  I tried what
i can figure from the documentation and your feedback and looks like
FIO has limitation in this front.

On Thu, Jul 13, 2017 at 11:20 PM, Sitsofe Wheeler <sitsofe@xxxxxxxxx> wrote:
Hi,

Over on https://github.com/Microsoft/StorScore/blob/master/recipes/write_impact_check.rcp
the reads *must* be being capped otherwise what purpose does
$read_tput_KBps have? See the following:

         # Run the workload once undisturbed as a baseline.
         my %baseline =
             test( %workload, description => "$desc Baseline" );

         my $read_tput_KBps = $baseline{'MB/sec Total'} * 1000;
[...]

         # Run again, this time with an aggressor process injecting
         # writes in the background.  The aggressor is rate-limited
         # to a fraction of the baseline read throughput.
         foreach my $write_workload ( @write_workloads )
         {
[...]

             my $write_tput_KBps =
                 int ( ( $write_pct / 100 ) * $read_tput_KBps );

The comment explains where the read "cap" came from. Briefly the
StoreScore program seems to be doing this:
1. Run DISKSPD briefly to find the random read throughput of the disk
and put this value into the prev_rand_read_tput variable.
2. seq_write_tput = prev_rand_read_tput * 10 / 100.
3. rand_read_tput = prev_rand_read_tput.
4. Run a detached DISKSPD at the same time as 4 that does sequential
writes but with a throughput cap of seq_write_tput.
6. Run a detached DISKSPD that does random writes but with a
throughput cap of rand_read_tput.

This strikes me as different to what you were proposing:
a) You said: "I want fio to do 10% 'this' / 90% 'that'"
b) That StorScore program does: "100% of 'that' capped to what I've
'previously measured that' can do, and 'this' capped to 10% of the
'previously measured that'".

"Previously measured" becomes fixed and won't be changing as the
subsequent DISKSPD processes are running. Further, the ratio in a)
isn't between the same things as the ratio in b) and the even the
ratio itself isn't the same!

For example, in the StorScore program imagine the baseline is
100MBytes/s. You start running 4. and 5. together and the DISKSPD
process at 4. immediately falls to 0 (e.g. the disk is weird and can't
do reads and writes simultaneously). That does not mean the step 5.
DISKSPD will limit its throughput to 0 and continues chugging away at
10MBytes/s. Thus the ratio between the step 4 DISKSPD and the step 5
DISKSPD quickly becomes 10:0.

You would have to talk to Microsoft as to why their StoreScore recipe
was written the way it is but the comments do suggest it was
intentional. It's worth noting DISKSPD itself can't balance entire
running "jobs" against each other but as previously mentioned that's
not what the StorScore program you linked is trying to do anyway...

On 13 July 2017 at 21:55, abhishek koundal <akoundal@xxxxxxxxx> wrote:
Yes that is correct.
So e.g. if you have RND_RD  b/w 45MBps and had 10%SEQ_WR 90%RND_RD the
write throughput will be:
=10/100 *45MBs = ~4.5MB/s

Now if you try to cap the write throughput (rate =5m) to this in
Linux, the RND_RD also gets impacted and wont run to its full
potential.

In StorScor reads thread will not get capped and only writes thread
will be capped, hence deliver the expected results.

On Thu, Jul 13, 2017 at 1:33 PM, Sitsofe Wheeler <sitsofe@xxxxxxxxx> wrote:
Hey,

My Perl is a bit rusty but look at how the write throughput is calculated:

         # Run again, this time with an aggressor process injecting
         # writes in the background.  The aggressor is rate-limited
         # to a fraction of the baseline read throughput.
         foreach my $write_workload ( @write_workloads )
         {
             my ($write_pattern, $write_pct) = @$write_workload;

             my $write_tput_KBps =
                 int ( ( $write_pct / 100 ) * $read_tput_KBps );

So it is accounting for the fact that the write throughput will be
limited to a percentage of the read throughput.

On 13 July 2017 at 20:02, abhishek koundal <akoundal@xxxxxxxxx> wrote:
Hi,
The main thing that i am trying to do is to model the storScor in
linux enviornment
(https://github.com/Microsoft/StorScore/blob/master/recipes/write_impact_check.rcp).
So there you can control the rate of the of the writes as per the %
that is required. While running that on windows i can see that i can
have different BS and rate defined for the runs achieving expected
output.

In the above mentioned case the RND_RD will be unconstrained (i.e.
full BW they can achieve) but writes will be slowed to x% of RND_RD.
In fio when
e.g.
SEQ_WR 10% and RND_RD 90%, with seq_wr rate capped at 10% of rnd rd
(if rnd_rd was 40m)
[SEQ_WR]
rate=4m
bs=128K
rw=write
flow=9
[RND_RD]
rate=40m
bs=4k
rw=randread
flow=-1
When i run this the output that comes back shows that SEQ_WR b/w is
capped ~ 4m but RND_RD also gets severely hit for the IOPS+BW as it
tried to put the total BW under <45m>. I want that RND_RD doesn't need
to be capped and should run to it closest potential like I am see in
storScor.
Really appreciate for the support and helping me understand the limitations.
--
Sitsofe | http://sucs.org/~sits/
--
Sitsofe | http://sucs.org/~sits/

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html