Ceph and Infiniband

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/23/2014 03:54 AM, Andrei Mikhailovsky wrote:
> Ricardo,
>
> Thought to share my testing results.
>
> I've been using IPoIB with ceph for quite some time now. I've got QDR
> osd/mon/client servers to serve rbd images to kvm hypervisor. I've done
> some performance testing using both rados and guest vm benchmarks while
> running the last three stable versions of ceph.
>
> My conclusion was that ceph itself needs to mature and/or be optimised
> in order to utilise the capabilities of the infiniband link. In my
> experience, I was not able to reach the limits of the network speeds
> reported to me by the network performance monitoring tools. I was
> struggling to push data throughput beyond 1.5GB/s while using between 2
> and 64 concurrent tests. This was the case when the benchmark data was
> using the same data over and over again and the data was cached on the
> osd servers and was coming directly from server's ram without any access
> to the osds themselves.
>
> My ipoib network performance tests were showing on average 2.5-3GB/s
> with peaks reaching 3.3GB/s over ipoib. It would be nice to see how ceph
> is performing over rdma ))).
>
> Having said this, perhaps my test gear is somewhat limited or my ceph
> optimisation was not done correctly. I had 2 osd servers with 8 osds
> each and three clients running guest vms and rados benchmarks. None of
> the benchmarks were able to fully utilise the server resources. my osd
> servers were running on about 50% utilisation during the tests.
>
> So, I had to conclude that unless you are running a large cluster with
> some specific data sets that utilise multithreading you will probably
> not need to have an infiniband link. A single thread performance for the
> cold data will be limited to about 1/2 of the speed of a single osd
> device. So, if your osds are running 150MB/s do not expect to have a
> single thread faster than 70-80MB/s.
>
> On the other hand, if you utilise high performance gear, like cache
> cards capable of achieving speeds of over gigabytes per second, perhaps
> infiniband link might be of use. Not sure if the ceph-osd process is
> capable of "spitting" out this amount of data though. You might be
> having a CPU bottleneck.

FWIW, when we were testing Ceph with QDR IB at ORNL, we topped out at 
around 2GB/s per server node with IPoIB.  This was with a rather 
unconventional setup though with a DDN SFA10K and RAID5 LUNs with lots 
of disks per OSD.  On my (more conventional) high performance test box, 
I can hit 2GB/s with 24 disks, 8 ssds, and 4 SAS2308 controllers, at 
least when streaming 4MB objects in and out of rados.  I suspect for 
most people 10GbE will be fast enough for many workloads (though QDR IB 
might be cheaper if you know how to implement it!)

>
> Andrei
>
>
> ------------------------------------------------------------------------
> *From: *"Sage Weil" <sweil at redhat.com>
> *To: *"Riccardo Murri" <riccardo.murri at uzh.ch>
> *Cc: *ceph-users at lists.ceph.com
> *Sent: *Tuesday, 22 July, 2014 9:42:56 PM
> *Subject: *Re: [ceph-users] Ceph and Infiniband
>
> On Tue, 22 Jul 2014, Riccardo Murri wrote:
>  > Hello,
>  >
>  > a few questions on Ceph's current support for Infiniband
>  >
>  > (A) Can Ceph use Infiniband's native protocol stack, or must it use
>  > IP-over-IB?  Google finds a couple of entries in the Ceph wiki related
>  > to native IB support (see [1], [2]), but none of them seems finished
>  > and there is no timeline.
>  >
>  > [1]:
> https://wiki.ceph.com/Planning/Blueprints/Emperor/msgr%3A_implement_infiniband_support_via_rsockets
>  > [2]:
> http://wiki.ceph.com/Planning/Blueprints/Giant/Accelio_RDMA_Messenger
>
> This is work in progress.  We hope to get basic support into the tree
> in the next couple of months.
>
>  > (B) Can we connect to the same Ceph cluster from Infiniband *and*
>  > Ethernet?  Some clients do only have Ethernet and will not be
>  > upgraded, some others would have QDR Infiniband -- we would like both
>  > sets to access the same storage cluster.
>
> This is further out.  Very early refactoring to make this work in
> wip-addr.
>
>  > (C) I found this old thread about Ceph's performance on 10GbE and
>  > Infiniband: are the issues reported there still current?
>  >
>  > http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/6816
>
> No idea!  :)
>
> sage
>
>  >
>  >
>  > Thanks for any hint!
>  >
>  > Riccardo
>  >
>  > --
>  > Riccardo Murri
>  > http://www.s3it.uzh.ch/about/team/
>  >
>  > S3IT: Services and Support for Science IT
>  > University of Zurich
>  > Winterthurerstrasse 190, CH-8057 Z?rich (Switzerland)
>  > Tel: +41 44 635 4222
>  > Fax: +41 44 635 6888
>  > _______________________________________________
>  > ceph-users mailing list
>  > ceph-users at lists.ceph.com
>  > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>  >
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux