Re: Interpretation Guidance for Slow Requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 7 Dec 2016, at 14:39, Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:

On 12/07/16 13:52, Christian Balzer wrote:
On Wed, 7 Dec 2016 12:39:11 +0100 Christian Theune wrote:

| cartman06 ~ # fio --filename=/dev/sdl --direct=1 --sync=1 --rw=write --bs=128k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test
| journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
| fio-2.0.14
| Starting 1 process
| Jobs: 1 (f=1): [W] [100.0% done] [0K/88852K/0K /s] [0 /22.3K/0  iops] [eta 00m:00s]
| journal-test: (groupid=0, jobs=1): err= 0: pid=28606: Wed Dec  7 11:59:36 2016
|   write: io=5186.7MB, bw=88517KB/s, iops=22129 , runt= 60001msec
|     clat (usec): min=37 , max=1519 , avg=43.77, stdev=10.89
|      lat (usec): min=37 , max=1519 , avg=43.94, stdev=10.90
|     clat percentiles (usec):
|      |  1.00th=[   39],  5.00th=[   40], 10.00th=[   40], 20.00th=[   41],
|      | 30.00th=[   41], 40.00th=[   42], 50.00th=[   42], 60.00t848/h=[   42],
|      | 70.00th=[   43], 80.00th=[   44], 90.00th=[   47], 95.00th=[   53],
|      | 99.00th=[   71], 99.50th=[   87], 99.90th=[  157], 99.95th=[  201],
|      | 99.99th=[  478]
|     bw (KB/s)  : min=81096, max=91312, per=100.00%, avg=88519.19, stdev=1762.43
|     lat (usec) : 50=92.42%, 100=7.28%, 250=0.27%, 500=0.02%, 750=0.01%
|     lat (usec) : 1000=0.01%
|     lat (msec) : 2=0.01%
|   cpu          : usr=5.43%, sys=14.64%, ctx=1327888, majf=0, minf=6
|   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
|      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
|      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
|      issued    : total=r=0/w=1327777/d=0, short=r=0/w=0/d=0
|
| Run status group 0 (all jobs):
|   WRITE: io=5186.7MB, aggrb=88516KB/s, minb=88516KB/s, maxb=88516KB/s, mint=60001msec, maxt=60001msec
|
| Disk stats (read/write):
|   sdl: ios=15/1326283, merge=0/0, ticks=1/47203, in_queue=46970, util=78.29%

That doesn’t look too bad to me, specifically the 99.99th of 478 microseconds seems fine.

The iostat during this run looks OK as well:

Both do look pretty bad to to me.

Your SSD with a nominal write speed of 850MB/s is doing 88MB/s at 80%
utilization.
The puny 400GB DC S3610 in my example earlier can do 400MB/s per Intel
specs and was at 70% with 300MB/s (so half of it journal writes!).
My experience with Intel SSDs (as mentioned before in this ML) is that
their stated speeds can actually be achieved within about a 10% margin
when used with Ceph, be it for pure journaling or as OSDs with inline
journals.
I don't see how this makes any sense. Could you correct or explain it so
it does?

- 300MB/s at 4k is like 77k iops. The Intel 400GB DC S3610 spec[1] says
it does 25k. So I think you should be more specific in how you tested it.

- His ssd is rated at 15k random write ops[2], so it's exceeding that by
a bunch (both reported by iostat and fio around 22k) (but they don't
list a sequential rating)

The sequential tests are lacking. I was able to produce ~520MB/s read with 130k IOPS. From an IOPS perspective this seems fine, but the bandwidth is still lacking (although with 500 MB/s we’re slowly getting closer to the 6GBit limit.

I haven’t been able to push the drive to higher bandwidth on the writing end. I did a full conditioning run which maxed out for a while around 200MB/s and then dropped to around 100MB/s where it’s been staying now. I’m surprised in the sense that the reviews I found did explicitly show achieving >500MB sustainable sequential write. Either I’m getting screwed as this not being _exactly_ the same drive as in the test (possible) or something else is off (also possible). I’m contacting Micron now - let’s see what kind of odyssee that will cause. ;)

As another step I’m evaluating whether I have options available to put the drive in a location where I can bypass the RAID controller, just to make sure.

- his command says bs=128k, but the output says 4k ... so he didn't
really run that command for that result, or it's bugged. (is this where
the confusion lies?)

Yikes. That was a copy/paste error from my terminal to my work log. I think I “verschlimmbessert” this when I thought it was the wrong command line in the first place. 
This was a 4k run, the other options being identical - as described by Sebastian.

- also note he didn't set -ioengine=... so depending on how the default
changes per version, others could be comparing psync or others to his
ioengine=sync, so that should be specifically stated for comparing results.

My version says ioengine=sync as default. Didn’t know this varies over versions and should be specified, sorry.

Cheers,
Christian

-- 
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux