On Nov 8, 2012, at 3:22 AM, Gandalf Corvotempesta <gandalf.corvotempesta@xxxxxxxxx> wrote: > 2012/11/8 Mark Nelson <mark.nelson@xxxxxxxxxxx>: >> I haven't done much with IPoIB (just RDMA), but my understanding is that it >> tends to top out at like 15Gb/s. Some others on this mailing list can >> probably speak more authoritatively. Even with RDMA you are going to top >> out at around 3.1-3.2GB/s. > > 15Gb/s is still faster than 10Gbe > But this speed limit seems to be kernel-related and should be the same > even in a 10Gbe environment, or not? We have a test cluster with Mellanox QDR HCAs (i.e. NICs). When using Verbs (the native IB API), I see ~27 Gb/s between two hosts. When running Sockets over these devices using IPoIB, I see 13-22 Gb/s depending on whether I use interrupt affinity and process binding. For our Ceph testing, we will set the affinity of two of the mlx4 interrupt handlers to cores 0 and 1 and we will not using process binding. For single stream Netperf, we do use process binding and bind it to the same core (i.e. 0) and we see ~22 Gb/s. For multiple, concurrent Netperf runs, we do not use process binding but we still see ~22 Gb/s. We used all of the Mellanox tuning recommendations for IPoIB available in their tuning pdf: http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf We looked at their interrupt affinity setting scripts and then wrote our own. Our testing is with IPoIB in "connected" mode, not "datagram" mode. Connected mode is less scalable, but currently I only get ~3 Gb/s with datagram mode. Mellanox claims that we should get identical performance with both modes and we are looking into it. We are getting a new test cluster with FDR HCAs and I will look into those as well. Scott-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html