Re: Performance problems

Ziemowit Pierzycki <ziemowit@xxxxxxxxxxxxx> · Wed, 10 Apr 2013 09:36:11 -0500

When executing ceph -w I see the following warning:
2013-04-09 22:38:07.288948 osd.2 [WRN] slow request 30.180683 seconds old, received at 2013-04-09 22:37:37.108178: osd_op(client.4107.1:9678 10000000002.000001df [write 0~4194304 [6@0]] 0.4e208174 snapc 1=[]) currently waiting for subops from [0]

So what could be causing this?

On Tue, Apr 9, 2013 at 12:54 PM, Ziemowit Pierzycki <ziemowit@xxxxxxxxxxxxx> wrote:

Neither made a difference.  I also have a glusterFS cluster with two nodes in replicating mode residing on 1TB drives:

[root@triton speed]# dd conv=fdatasync if=/dev/zero of=/mnt/speed/test.out bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 43.573 s, 120 MB/s

... and Ceph:

[root@triton temp]# dd conv=fdatasync if=/dev/zero of=/mnt/temp/test.out bs=512k count=10000
10000+0 records in
10000+0 records out

5242880000 bytes (5.2 GB) copied, 366.911 s, 14.3 MB/s

On Mon, Apr 8, 2013 at 4:29 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:

On 04/08/2013 04:12 PM, Ziemowit Pierzycki wrote:

There is one SSD in each node.  IPoIB performance is about 7 gbps

between each host.  CephFS is mounted via kernel client.  Ceph version

is ceph-0.56.3-1.  I have a 1GB journal on the same drive as the OSD but

on a seperate file system split via LVM.

Here is output of another test with fdatasync:

[root@triton temp]# dd conv=fdatasync if=/dev/zero of=/mnt/temp/test.out

bs=512k count=10000

10000+0 records in

10000+0 records out

5242880000 bytes (5.2 GB) copied, 359.307 s, 14.6 MB/s

[root@triton temp]# dd if=/mnt/temp/test.out of=/dev/null bs=512k

count=10000

10000+0 records in

10000+0 records out

5242880000 bytes (5.2 GB) copied, 14.0521 s, 373 MB/s

Definitely seems off!  How many SSDs are involved and how fast are they each?  The MTU idea might have merit, but I honestly don't know enough about how well IPoIB handles giant MTUs like that.  One thing I have noticed on other IPoIB setups is that TCP autotuning can cause a ton of problems.  You may want to try disabling it on all of the hosts involved:

echo 0 | tee /proc/sys/net/ipv4/tcp_moderate_rcvbuf

If that doesn't work, maybe try setting MTU to 9000 or 1500 if possible.

Mark

The network traffic appears to match the transfer speeds shown here too.

  Writing is very slow.

On Mon, Apr 8, 2013 at 3:04 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx

<mailto:mark.nelson@inktank.com>> wrote:

    Hi,

    How many drives?  Have you tested your IPoIB performance with iperf?

      Is this CephFS with the kernel client?  What version of Ceph?  How

    are your journals configured? etc.  It's tough to make any

    recommendations without knowing more about what you are doing.

    Also, please use conv=fdatasync when doing buffered IO writes with dd.

    Thanks,

    Mark

    On 04/08/2013 03:00 PM, Ziemowit Pierzycki wrote:

        Hi,

        The first test was writing 500 mb file and was clocked at 1.2

        GBps.  The

        second test was writing 5000 mb file at 17 MBps.  The third test was

        reading the file at ~400 MBps.

        On Mon, Apr 8, 2013 at 2:56 PM, Gregory Farnum <greg@xxxxxxxxxxx

        <mailto:greg@xxxxxxxxxxx>

        <mailto:greg@xxxxxxxxxxx <mailto:greg@xxxxxxxxxxx>>> wrote:

             More details, please. You ran the same test twice and

        performance went

             up from 17.5MB/s to 394MB/s? How many drives in each node,

        and of what

             kind?

             -Greg

             Software Engineer #42 @ http://inktank.com | http://ceph.com

             On Mon, Apr 8, 2013 at 12:38 PM, Ziemowit Pierzycki

             <ziemowit@xxxxxxxxxxxxx <mailto:ziemowit@xxxxxxxxxxxxx>

        <mailto:ziemowit@xxxxxxxxxxxxx

        <mailto:ziemowit@xxxxxxxxxxxxx>__>> wrote:

              > Hi,

              >

              > I have a 3 node SSD-backed cluster connected over

        infiniband (16K

             MTU) and

              > here is the performance I am seeing:

              >

              > [root@triton temp]# !dd

              > dd if=/dev/zero of=/mnt/temp/test.out bs=512k count=1000

              > 1000+0 records in

              > 1000+0 records out

              > 524288000 bytes (524 MB) copied, 0.436249 s, 1.2 GB/s

              > [root@triton temp]# dd if=/dev/zero

        of=/mnt/temp/test.out bs=512k

              > count=10000

              > 10000+0 records in

              > 10000+0 records out

              > 5242880000 bytes (5.2 GB) copied, 299.077 s, 17.5 MB/s

              > [root@triton temp]# dd if=/mnt/temp/test.out

        of=/dev/null bs=512k

              > count=1000010000+0 records in

              > 10000+0 records out

              > 5242880000 bytes (5.2 GB) copied, 13.3015 s, 394 MB/s

              >

              > Does that look right?  How do I check this is not a network

             problem, because

              > I remember seeing a kernel issue related to large MTU.

              >

              > _________________________________________________

              > ceph-users mailing list

              > ceph-users@xxxxxxxxxxxxxx

        <mailto:ceph-users@xxxxxxxxxx.com>

        <mailto:ceph-users@xxxxxxxxxx.__com

        <mailto:ceph-users@xxxxxxxxxx.com>>

              >

        http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

              >

        _________________________________________________

        ceph-users mailing list

        ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>

        http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

    _________________________________________________

    ceph-users mailing list

    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>

    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com