Re: Switches and latency

Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> · Thu, 16 Jun 2016 12:54:25 +0200

Hi,

aside from the question of the coolness factor of Infinitiband,
you should always also consider the question of replacing parts and
extending cluster.

A 10G Network environment is up to date currently, and will be for some
more years. You can easily get equipment for it, and the pricing gets
lower and lower. Also you can use that network environment also for
other stuff ( if needed ) just to keep flexibility.

With the IB stuff, you can only use it for one purpose. And you have a (
very ) limited choice of options to get new parts.

So, from the point of flexibility and the cost/gain ratio, i dont see
where IB will do a good job for you in the long shot.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

Am 16.06.2016 um 12:44 schrieb Gandalf Corvotempesta:
> 2016-06-16 3:53 GMT+02:00 Christian Balzer <chibi@xxxxxxx>:
>> Gandalf, first read:
>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg29546.html
>>
>> And this thread by Nick:
>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg29708.html
> 
> Interesting reading. Thanks.
> 
>> Overly optimistic.
>> In an idle cluster with synthetic tests you might get sequential reads
>> that are around 150MB/s per HDD.
>> As for writes, think 80MB/s, again in an idle cluster.
>>
>> Any realistic, random I/O and you're looking at 50MB/s at most either way.
>>
>> So your storage nodes can't really saturate even a single 10Gb/s link in
>> real life situations.
> 
> Ok.
> 
>> Journal SSDs can improve on things, but that's mostly for IOPS.
>> In fact they easily become the bottleneck bandwidth wise and are so on
>> most of my storage nodes.
>> Because you'd need at least 2 400GB DC S3710 SSDs to get around 1GB/s
>> writes, or one link worth.
> 
> I plan to use 1 or 2 SSD journal (probably, 1 SSD every 6 spinning disks)
> 
>> Splitting things in cluster and public networks ONLY makes sense when your
>> storage node can saturate ALL the network bandwidth, which usually is only
>> the case when it comes to very expensive SSD/NVMe only nodes.
> 
> This is not my case.
> 
>> Going back to your original post, with a split network the latency in both
>> networks counts the same, as a client write will NOT be acknowledged until
>> it has reach the journal of all replicas, so having a higher latency
>> cluster network is counterproductive.
> 
> Ok.
> 
>> Or if you can start with a clean slate (including the clients), look at
>> Infiniband.
>> All my production clusters are running entirely IB (IPoIB currently) and
>> I'm very happy with the performance, latency and cost.
> 
> Yes, i'll start with a brand new network.
> Acutally i'm testing with some old IB switches (DDR) and i'm not very
> happy, as IPoIB doesn't go over 8/9Gbit/s in a DDR. Additionally, CX4
> cables used by DDR are... HUGE and very "hard" to bend in the rack.
> I don't know if QDR cables are thinner.
> 
> Are you using QDR? I've seen a couple of mellanox used switches on ebay
> that seems to be ok for me. 36 QDR ports would be awesome but I don't
> have any IB knowledge.
> Could I keep the IB fabric unconfigured and use only IPoIB ?
> I can create a bonded (failover) IPoIB device on each node and add 2 or more
> IB cables between both switches. In a normal Ethernet network, these 2 cables
> must be joined in a LAG to avoid loops. Is infiniband able to manage
> this on their
> own ? I've never find a way to aggragate multiple ports.
> 
> The real drawback with IB is that I have to add IB cards on each compute nodes,
> where my current compute nodes a 2 10GBaseT ports onboard.
> 
> This add some costs....
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com