Re: RDMA/Infiniband status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

What I took from the longish thread on the OFED ML was that certain things
(and more than you'd think) with IPoIB happen in multicast, not ALL of
them.

For the record, my bog standard QDR, IPoIB clusters can do anywhere from
14 to 21Gb/s with iperf3 and about 20-30% less with NPtcp (netpipe-tcp).
That spread is clearly the result of CPU speeds and/or PCIe levels in the
various servers.

And also somewhat academic in my case, as all of the Ceph storage nodes
can't do more than 1GB/s in sequential writes anyway. ^o^

Again, I shall try to put numbers to this once I have that test cluster.

Christian

On Fri, 10 Jun 2016 06:04:33 -0600 Corey Kovacs wrote:

> Infiniband uses multicast internally.  It's not something you have a
> choice with.  You won't see it on the local interface any more than
> you'd see individual drives of a raid 5.
> 
> I believe it's one of the reasons the connection setup speeds are kept
> under the requisite 1.2usec limits etc.
> On Jun 10, 2016 4:16 AM, "Daniel Swarbrick" <
> daniel.swarbrick@xxxxxxxxxxxxxxxx> wrote:
> 
> On 10/06/16 02:33, Christian Balzer wrote:
> >
> >
> > This thread brings back memories of this one:
> > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-April/008792.html
> >
> > According to Robert IPoIB still uses IB multicast under the hood even
> > when from an IP perspective traffic would be unicast.
> 
> I'd be interested to see some concrete proof of that. We run several IB
> fabrics here using Mellanox QDR HCAs, and run a mixture of SRP and IPoIB
> over them. We don't explicitly override the mcast rate, so it's safe to
> assume that this is the default SDR, 10 Gbps rate.
> 
> Testing with iperf3, I've seen single flow IPoIB (CM) reach about 20
> Gbps, and multiple flows top out at around a combined 25 Gbps.
> 
> On the other hand, testing with ib_write_bw (RDMA, single "flow"), we
> usually get just under 30 Gbps. So there is a fair bit of overhead in
> IPoIB, but I'm skeptical that it uses mcast IB all the time. Nothing in
> the Linux IPoIB kernel modules stands out as looking like "use multicast
> for everything."
> 
> >
> > The biggest issue pointed out in that mail and the immensely long
> > and complex thread he mentioned in it is that you can't change the
> > speed settings on the fly.
> > which means that if you're already in production it's unlikely that
> > there will ever be a time to entirely tear down your IB network...
> >
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux