Re: Performance problems

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Thu, 11 Apr 2013 07:42:57 -0500

With GlusterFS are you using the native RDMA support?

Ceph and Gluster tend to prefer pretty different disk setups too.  Afaik 
RH still recommends RAID6 beind each brick while we do better with 
individual disks behind each OSD.  You might want to watch the OSD admin 
socket and see if operations are backing up on any specific OSDs.

Mark

On 04/09/2013 12:54 PM, Ziemowit Pierzycki wrote:
Neither made a difference.  I also have a glusterFS cluster with two
nodes in replicating mode residing on 1TB drives:

[root@triton speed]# dd conv=fdatasync if=/dev/zero
of=/mnt/speed/test.out bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 43.573 s, 120 MB/s

... and Ceph:

[root@triton temp]# dd conv=fdatasync if=/dev/zero of=/mnt/temp/test.out
bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 366.911 s, 14.3 MB/s

On Mon, Apr 8, 2013 at 4:29 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx
<mailto:mark.nelson@xxxxxxxxxxx>> wrote:

    On 04/08/2013 04:12 PM, Ziemowit Pierzycki wrote:

        There is one SSD in each node.  IPoIB performance is about 7 gbps
        between each host.  CephFS is mounted via kernel client.  Ceph
        version
        is ceph-0.56.3-1.  I have a 1GB journal on the same drive as the
        OSD but
        on a seperate file system split via LVM.

        Here is output of another test with fdatasync:

        [root@triton temp]# dd conv=fdatasync if=/dev/zero
        of=/mnt/temp/test.out
        bs=512k count=10000
        10000+0 records in
        10000+0 records out
        5242880000 bytes (5.2 GB) copied, 359.307 s, 14.6 MB/s
        [root@triton temp]# dd if=/mnt/temp/test.out of=/dev/null bs=512k
        count=10000
        10000+0 records in
        10000+0 records out
        5242880000 bytes (5.2 GB) copied, 14.0521 s, 373 MB/s

    Definitely seems off!  How many SSDs are involved and how fast are
    they each?  The MTU idea might have merit, but I honestly don't know
    enough about how well IPoIB handles giant MTUs like that.  One thing
    I have noticed on other IPoIB setups is that TCP autotuning can
    cause a ton of problems.  You may want to try disabling it on all of
    the hosts involved:

    echo 0 | tee /proc/sys/net/ipv4/tcp___moderate_rcvbuf

    If that doesn't work, maybe try setting MTU to 9000 or 1500 if possible.

    Mark

        The network traffic appears to match the transfer speeds shown
        here too.
           Writing is very slow.

        On Mon, Apr 8, 2013 at 3:04 PM, Mark Nelson
        <mark.nelson@xxxxxxxxxxx <mailto:mark.nelson@xxxxxxxxxxx>
        <mailto:mark.nelson@inktank.__com
        <mailto:mark.nelson@xxxxxxxxxxx>>> wrote:

             Hi,

             How many drives?  Have you tested your IPoIB performance
        with iperf?
               Is this CephFS with the kernel client?  What version of
        Ceph?  How
             are your journals configured? etc.  It's tough to make any
             recommendations without knowing more about what you are doing.

             Also, please use conv=fdatasync when doing buffered IO
        writes with dd.

             Thanks,
             Mark

             On 04/08/2013 03:00 PM, Ziemowit Pierzycki wrote:

                 Hi,

                 The first test was writing 500 mb file and was clocked
        at 1.2
                 GBps.  The
                 second test was writing 5000 mb file at 17 MBps.  The
        third test was
                 reading the file at ~400 MBps.

                 On Mon, Apr 8, 2013 at 2:56 PM, Gregory Farnum
        <greg@xxxxxxxxxxx <mailto:greg@xxxxxxxxxxx>
                 <mailto:greg@xxxxxxxxxxx <mailto:greg@xxxxxxxxxxx>>
                 <mailto:greg@xxxxxxxxxxx <mailto:greg@xxxxxxxxxxx>
        <mailto:greg@xxxxxxxxxxx <mailto:greg@xxxxxxxxxxx>>>> wrote:

                      More details, please. You ran the same test twice and
                 performance went
                      up from 17.5MB/s to 394MB/s? How many drives in
        each node,
                 and of what
                      kind?
                      -Greg
                      Software Engineer #42 @ http://inktank.com |
        http://ceph.com

                      On Mon, Apr 8, 2013 at 12:38 PM, Ziemowit Pierzycki
                      <ziemowit@xxxxxxxxxxxxx
        <mailto:ziemowit@xxxxxxxxxxxxx> <mailto:ziemowit@xxxxxxxxxxxxx
        <mailto:ziemowit@xxxxxxxxxxxxx>__>
                 <mailto:ziemowit@xxxxxxxxxxxxx
        <mailto:ziemowit@xxxxxxxxxxxxx>

                 <mailto:ziemowit@xxxxxxxxxxxxx
        <mailto:ziemowit@xxxxxxxxxxxxx>__>__>> wrote:
                       > Hi,
                       >
                       > I have a 3 node SSD-backed cluster connected over
                 infiniband (16K
                      MTU) and
                       > here is the performance I am seeing:
                       >
                       > [root@triton temp]# !dd
                       > dd if=/dev/zero of=/mnt/temp/test.out bs=512k
        count=1000
                       > 1000+0 records in
                       > 1000+0 records out
                       > 524288000 bytes (524 MB) copied, 0.436249 s,
        1.2 GB/s
                       > [root@triton temp]# dd if=/dev/zero
                 of=/mnt/temp/test.out bs=512k
                       > count=10000
                       > 10000+0 records in
                       > 10000+0 records out
                       > 5242880000 bytes (5.2 GB) copied, 299.077 s,
        17.5 MB/s
                       > [root@triton temp]# dd if=/mnt/temp/test.out
                 of=/dev/null bs=512k
                       > count=1000010000+0 records in
                       > 10000+0 records out
                       > 5242880000 bytes (5.2 GB) copied, 13.3015 s,
        394 MB/s
                       >
                       > Does that look right?  How do I check this is
        not a network
                      problem, because
                       > I remember seeing a kernel issue related to
        large MTU.
                       >
                       > ___________________________________________________

                       > ceph-users mailing list
                       > ceph-users@xxxxxxxxxxxxxx
        <mailto:ceph-users@xxxxxxxxxxxxxx>
                 <mailto:ceph-users@xxxxxxxxxx.__com
        <mailto:ceph-users@xxxxxxxxxxxxxx>>
                 <mailto:ceph-users@xxxxxxxxxx.
        <mailto:ceph-users@xxxxxxxxxx.>____com
                 <mailto:ceph-users@xxxxxxxxxx.__com
        <mailto:ceph-users@xxxxxxxxxxxxxx>>>
                       >
        http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
        <http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>

        <http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>

                       >

                 ___________________________________________________

                 ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
        <mailto:ceph-users@xxxxxxxxxx.__com
        <mailto:ceph-users@xxxxxxxxxxxxxx>>
        http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
        <http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>

        <http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>

             ___________________________________________________

             ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
        <mailto:ceph-users@xxxxxxxxxx.__com
        <mailto:ceph-users@xxxxxxxxxxxxxx>>
        http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
        <http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>

             <http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>

        _________________________________________________
        ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
        http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

    _________________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com