Re: [EXTERNAL] Ceph performance is too good (impossible..)...

Mike Miller <millermike287@xxxxxxxxx> · Fri, 16 Dec 2016 10:00:16 +0100

Hi,

you need to flush all caches before starting read tests. With fio you 
can probably do this if you keep the files that it creates.

as root on all clients and all osd nodes run:

echo 3 > /proc/sys/vm/drop_caches

But fio is a little problematic for ceph because of the caches in the 
clients and the osd nodes. If you really need to know the read rates, go 
for large files, write them with dd, flush all caches, and the read the 
files with dd. Single threaded read dd shows less throughput compared to 
multiple threads dd read. readahead size does also matter.

Good luck testing.

- mike

On 12/13/16 2:37 AM, V Plus wrote:
The same..
see:
A: (g=0): rw=read, bs=5M-5M/5M-5M/5M-5M, ioengine=*libaio*, iodepth=1
...
fio-2.2.10
Starting 16 processes

A: (groupid=0, jobs=16): err= 0: pid=27579: Mon Dec 12 20:36:10 2016
  mixed: io=122515MB, bw=6120.3MB/s, iops=1224, runt= 20018msec

I think at the end, the only one way to solve this issue is to write the
image before read test....as suggested....

I have no clue why rbd engine does not work...

On Mon, Dec 12, 2016 at 4:23 PM, Will.Boege <Will.Boege@xxxxxxxxxx
<mailto:Will.Boege@xxxxxxxxxx>> wrote:

    Try adding --ioengine=libaio____

    __ __

    *From: *V Plus <v.plussharp@xxxxxxxxx <mailto:v.plussharp@xxxxxxxxx>>
    *Date: *Monday, December 12, 2016 at 2:40 PM
    *To: *"Will.Boege" <Will.Boege@xxxxxxxxxx
    <mailto:Will.Boege@xxxxxxxxxx>>
    *Subject: *Re: [EXTERNAL]  Ceph performance is too good
    (impossible..)...____

    __ __

    Hi Will, ____

    thanks very much..____

    However, I tried with your suggestions.____

    Both are *not *working...____

    1. with FIO rbd engine:
    *[RBD_TEST]
    ioengine=rbd
    clientname=admin
    pool=rbd
    rbdname=fio_test
    invalidate=1
    direct=1
    group_reporting=1
    unified_rw_reporting=1
    time_based=1
    rw=read
    bs=4MB
    numjobs=16
    ramp_time=10
    runtime=20*____

    then I run "sudo fio rbd.job" and  got:____

    RBD_TEST: (g=0): rw=read, bs=4M-4M/4M-4M/4M-4M, ioengine=rbd, iodepth=1
    ...
    fio-2.2.10
    Starting 16 processes
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    rbd engine: RBD version: 0.1.10
    Jobs: 12 (f=7): [R(5),_(1),R(4),_(1),R(2),_(2),R(1)] [100.0% done]
    [11253MB/0KB/0KB /s] [2813/0/0 iops] [eta 00m:00s]
    RBD_TEST: (groupid=0, jobs=16): err= 0: pid=17504: Mon Dec 12
    15:32:52 2016
      mixed: io=212312MB, *bw=10613MB/s*, iops=2653, runt= 20005msec____

    2. with blockalign
    [A]
    direct=1
    group_reporting=1
    unified_rw_reporting=1
    size=100%
    time_based=1
    filename=/dev/rbd0
    rw=read
    bs=5MB
    numjobs=16
    ramp_time=5
    runtime=20
    blockalign=512b

    [B]
    direct=1
    group_reporting=1
    unified_rw_reporting=1
    size=100%
    time_based=1
    filename=/dev/rbd1
    rw=read
    bs=5MB
    numjobs=16
    ramp_time=5
    runtime=20
    blockalign=512b

    sudo fio fioA.job -output a.txt & sudo  fio fioB.job -output b.txt
    & wait____

    Then I got:
    A: (groupid=0, jobs=16): err= 0: pid=19320: Mon Dec 12 15:35:32 2016
      mixed: io=88590MB, bw=4424.7MB/s, iops=884, runt= 20022msec____

    B: (groupid=0, jobs=16): err= 0: pid=19324: Mon Dec 12 15:35:32 2016
      mixed: io=88020MB, *bw=4395.6MB/s*, iops=879, runt= 20025msec____

    ..............____

    __ __

    On Mon, Dec 12, 2016 at 10:45 AM, Will.Boege <Will.Boege@xxxxxxxxxx
    <mailto:Will.Boege@xxxxxxxxxx>> wrote:____

        My understanding is that when using direct=1 on a raw block
        device FIO (aka-you) will have to handle all the sector
        alignment or the request will get buffered to perform the
        alignment.  ____

         ____

        Try adding the –blockalign=512b option to your jobs, or better
        yet just use the native FIO RBD engine.____

         ____

        Something like this (untested) - ____

         ____

        [A]____

        ioengine=rbd____

        clientname=admin____

        pool=rbd____

        rbdname=fio_test____

        direct=1____

        group_reporting=1____

        unified_rw_reporting=1____

        time_based=1____

        rw=read____

        bs=4MB____

        numjobs=16____

        ramp_time=10____

        runtime=20____

         ____

        *From: *ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx
        <mailto:ceph-users-bounces@xxxxxxxxxxxxxx>> on behalf of V Plus
        <v.plussharp@xxxxxxxxx <mailto:v.plussharp@xxxxxxxxx>>
        *Date: *Sunday, December 11, 2016 at 7:44 PM
        *To: *"ceph-users@xxxxxxxxxxxxxx
        <mailto:ceph-users@xxxxxxxxxxxxxx>" <ceph-users@xxxxxxxxxxxxxx
        <mailto:ceph-users@xxxxxxxxxxxxxx>>
        *Subject: *[EXTERNAL]  Ceph performance is too good
        (impossible..)...____

         ____

        Hi Guys,____

        we have a ceph cluster with 6 machines (6 OSD per host). ____

        1. I created 2 images in Ceph, and map them to another host A
        (*/outside /*the Ceph cluster). On host A, I
        got *//dev/rbd0/* and*/ /dev/rbd1/*.____

        2. I start two fio job to perform READ test on rbd0 and rbd1.
        (fio job descriptions can be found below)____

        */"sudo fio fioA.job -output a.txt & sudo  fio fioB.job -output
        b.txt  & wait"/*____

        3. After the test, in a.txt, we got */bw=1162.7MB/s/*, in b.txt,
        we get */bw=3579.6MB/s/*.____

        The results do NOT make sense because there is only one NIC on
        host A, and its limit is 10 Gbps (1.25GB/s).____

         ____

        I suspect it is because of the cache setting.____

        But I am sure that in file *//etc/ceph/ceph.conf/* on host A,I
        already added:____

        */[client]/*____

        */rbd cache = false/*____

         ____

        Could anyone give me a hint what is missing? why....____

        Thank you very much.____

         ____

        *fioA.job:*____

        /[A]/____

        /direct=1/____

        /group_reporting=1/____

        /unified_rw_reporting=1/____

        /size=100%/____

        /time_based=1/____

        /filename=/dev/rbd0/____

        /rw=read/____

        /bs=4MB/____

        /numjobs=16/____

        /ramp_time=10/____

        /runtime=20/____

         ____

        *fioB.job:*____

        /[B]/____

        /direct=1/____

        /group_reporting=1/____

        /unified_rw_reporting=1/____

        /size=100%/____

        /time_based=1/____

        /filename=/dev/rbd1/____

        /rw=read/____

        /bs=4MB/____

        /numjobs=16/____

        /ramp_time=10/____

        /runtime=20/____

         ____

        /Thanks.../____

    __ __

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com