Hello, On Wed, 05 Oct 2016 10:18:19 +0200 Denny Fuchs wrote: > Hi and good morning, > > Am 04.10.2016 17:19, schrieb Burkhard Linke: > > >> * Storage NIC: 1 x Infiniband MCX314A-BCCT > >> ** I red, that ConnectX-3 Pro is better supported, than the X-4 and a > >> bit cheaper > >> ** Switch: 2 x Mellanox SX6012 (56Gb/s) > >> ** Active FC cables > >> ** Maybe VPI is nice to have, but unsure. > > > > The Infiniband support is Ceph is experimental and not recommended for > > production uses. You'll have to fall back to IPoIB for the moment. The > > Oha, that is a bit new for me but we expecting to use IPoIB anyway, so > the SX6012 supports it out of the box, as I red. > The switch has nothing to do IPoIB, as the name implies it's entirely native Infiniband with IP encoded onto it. Thus its benefits from fast CPUs. Those switches also support actual Ethernet protocols via VPI, but that's not the same thing. I take it that you have no real experience with Infiniband or at least IPoIB? > > ConnectX-3 has configurable ports and also supports 40GbE, so ethernet > > switches might be an alternative for your setup. (some of the Mellanox > > switch support both infiniband and ethernet). > > We calculated how much costs full 10Gb network and we where at a point, > that IB makes more sense for us (as we have already 10Gb switches, but > with only 4 ports each switch). Have you looked at other 10Gb/s switches, like Arctica/Penguin and all the similar white boxes? > All other kind of high network equipment isn't standardize in a way, we > would trust ;-) Think on how many ways you have, to connect 10Gb ... > So IB is from our side a proven network (I thought). One other reason > was, that we can split 100% the network, for security/policy reason. > Security doesn't really factor into this, the cluster network in Ceph is only used for replication. Policy maybe, but not really an issue unless you could overwhelm your network, which we established you can't with the current design. > [...] > > data distribution over all SSDs, they might also be failing within a > > very short time span... > > Worst case: many SSDs fails at a very short timeslot .... :-( > > > The journal SSD is OK (we have the same model), but according to tests > > it is only capable of writing about 1 GB/s as journal SSD (obligatory > > blog link: > > http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/). > > I added from our side benchmarks to this side too. > > > With 2x 10GbE public network links, the SSD might become the > > bottleneck in large scale write operations. > > Here we are a bit unsure ... at this time, we thought of 2 x 1Gb/s > (LACP) or 10Gb/s with Intel 520 ... The thing is, that we already bought > 4 HP 2920 switches as a replacement for older switches. But now we > have/want to use them for our new project. So we have 10 physical server > (6 OSD nodes, 4 hypervisor nodes) but only 16 10Gb (SFP+) ports. > Not the best choice ... but bought without asking the right persons. :-/ > 1Gb/s will be painful and have significantly higher latency than the 10Gb/s links, don't go there. Christian > > > For read operations with 24 SSDs (assuming 400MB/s per SSD -> 9,6 > > GB/s) the network will definitely become the bottleneck. You also > > might want to check whether the I/O subsystem is able to drive 24 SSDs > > (SAS-3 has 12 GBit/s, expander are usually connected with 4 channels > > -> 6 GB/s). > > Our chassis has 12Gb without expander and all drives connected directly > via 24 port HBA :-) > > > Thanks a lot for suggestions. > > cu denny > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com