Re: how to understand latency when rate is set?

Sitsofe Wheeler <sitsofe@xxxxxxxxx> · Thu, 19 Apr 2018 09:16:21 +0100

Hi,

On 19 April 2018 at 04:08, shadow_lin <shadow_lin@xxxxxxx> wrote:
> Hi list,
>     I am using rate parameter to limit ecah job's speed.
>     The fio setting is:
> [4m]
> description="4m-seq-write"
> direct=1
> ioengine=libaio
> directory=data
> numjobs=60
> iodepth=4
> group_reporting
> rw=write
> bs=4M
> size=150G
> rate=10M
>
> The result is:
> 4m: (groupid=0, jobs=60): err= 0: pid=6421: Mon Apr 16 03:01:29 2018
>   Description : ["ceph rbd 4m-seq-write"]
>   write: io=9000.0GB, bw=614407KB/s, iops=150, runt=15359834msec
>     slat (usec): min=118, max=2339.4K, avg=736.32, stdev=13419.04
>     clat (msec): min=84, max=20170, avg=258.32, stdev=133.97
>      lat (msec): min=84, max=20171, avg=259.06, stdev=135.98
>     clat percentiles (msec):
>      | 1.00th=[ 122], 5.00th=[ 167], 10.00th=[ 176], 20.00th=[ 190],
>      | 30.00th=[ 206], 40.00th=[ 223], 50.00th=[ 239], 60.00th=[ 258],
>      | 70.00th=[ 281], 80.00th=[ 314], 90.00th=[ 367], 95.00th=[ 416],
>      | 99.00th=[ 529], 99.50th=[ 586], 99.90th=[ 791], 99.95th=[ 1004],
>      | 99.99th=[ 1958]
>     bw (KB /s): min= 204, max=72624, per=1.79%, avg=10979.20, stdev=5690.79
>     lat (msec) : 100=0.11%, 250=56.13%, 500=42.24%, 750=1.40%, 1000=0.07%
>     lat (msec) : 2000=0.04%, >=2000=0.01%
>   cpu : usr=0.08%, sys=0.03%, ctx=4263613, majf=0, minf=570
>   IO depths : 1=84.9%, 2=9.7%, 4=5.4%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued : total=r=0/w=2304000/d=0, short=r=0/w=0/d=0
> Run status group 0 (all jobs):
>   WRITE: io=9000.0GB, aggrb=614406KB/s, minb=614406KB/s, maxb=614406KB/s, mint=15359834msec, maxt=15359834msec
> Disk stats (read/write):
>   rbd0: ios=0/2305605, merge=0/6913029, ticks=0/594137632, in_queue=594308692, util=100.00%
>
> Then I did the same test without rate paramter
> [4m]
> description="4m-seq-write"
> direct=1
> ioengine=libaio
> directory=data
> numjobs=60
> iodepth=4
> group_reporting
> rw=write
> bs=4M
> size=150G
>
> The  result is:
> 4m: (groupid=0, jobs=60): err= 0: pid=30154: Tue Apr 17 03:13:55 2018
>   Description : ["ceph rbd 4m-seq-write"]
>   write: io=9000.0GB, bw=1048.1MB/s, iops=262, runt=8785724msec
>     slat (usec): min=113, max=16389K, avg=159607.76, stdev=284760.52
>     clat (msec): min=139, max=33403, avg=748.97, stdev=548.35
>      lat (msec): min=148, max=33842, avg=908.58, stdev=671.56
>     clat percentiles (msec):
>      | 1.00th=[ 212], 5.00th=[ 265], 10.00th=[ 306], 20.00th=[ 371],
>      | 30.00th=[ 429], 40.00th=[ 498], 50.00th=[ 578], 60.00th=[ 685],
>      | 70.00th=[ 840], 80.00th=[ 1057], 90.00th=[ 1434], 95.00th=[ 1778],
>      | 99.00th=[ 2540], 99.50th=[ 2835], 99.90th=[ 3589], 99.95th=[ 3916],
>      | 99.99th=[ 5997]
>     bw (KB /s): min= 135, max=103595, per=1.90%, avg=20420.87, stdev=11560.88
>     lat (msec) : 250=3.56%, 500=36.90%, 750=24.35%, 1000=12.77%, 2000=19.23%
>     lat (msec) : >=2000=3.19%
>   cpu : usr=0.24%, sys=0.08%, ctx=3888021, majf=0, minf=509
>   IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued : total=r=0/w=2304000/d=0, short=r=0/w=0/d=0
> Run status group 0 (all jobs):
>   WRITE: io=9000.0GB, aggrb=1048.1MB/s, minb=1048.1MB/s, maxb=1048.1MB/s, mint=8785724msec, maxt=8785724msec
> Disk stats (read/write):
>   rbd0: ios=0/2304660, merge=0/6912635, ticks=0/1107332792, in_queue=1107640692, util=100.00%
>
>
> The bandwith of the second test is much higher than the first one which is expected,

Yes.

> but the latency info is confusing.I was thinking with the same job number and io depth but higher bandwith the latency should be much lower.

The relationship between throughput and latency is more complicated
(higher throughput on the exact same system rarely means lower
latency). Normally it is only when the number of things you are
sending simultaneously goes above a threshold that their latency also
starts increasing. Imagine I have a wire four wide - I can put up to
four things down it simultaneously (say I have to send and wait for
them to come back). If I now want to send four things and then ANOTHER
four immediately I have to wait for the slots to become free before I
can send the last four. My throughput may be higher (I sent eight
things in total instead of four) but my latency has gone up due to
contention. Put another way, you can't calculate latency from
throughput when the throughput is not being maxed out.

> The first test with rate to limit the speed shows the avg latency is 259.06ms
> The second test without rate to limit the speed shows the avg latency is 908.58ms.
>
> Use 4M/latency*job_number*io_depth to calculate the bandwith.
> The second test can match the formula, but the first test is way off from the formula, I think it is because the method to calculate the latency is different.
>
> How should I understand the latency stats with rate parameters?

When you used rate you were limiting contention. Without it everything
tries to go as fast as it can creating contention. Looking at your job
are you sure your Ceph disk can really sustain an depth of 240 (4 *
60) I/Os simultaneously all the way? Notice how your disk utilization
hit 100% suggesting the system was waiting on your Ceph disk to
complete I/Os potentially hinting that the disk was overloaded.

-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html