Re: Switches and latency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2016-06-16 3:53 GMT+02:00 Christian Balzer <chibi@xxxxxxx>:
> Gandalf, first read:
> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg29546.html
>
> And this thread by Nick:
> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg29708.html

Interesting reading. Thanks.

> Overly optimistic.
> In an idle cluster with synthetic tests you might get sequential reads
> that are around 150MB/s per HDD.
> As for writes, think 80MB/s, again in an idle cluster.
>
> Any realistic, random I/O and you're looking at 50MB/s at most either way.
>
> So your storage nodes can't really saturate even a single 10Gb/s link in
> real life situations.

Ok.

> Journal SSDs can improve on things, but that's mostly for IOPS.
> In fact they easily become the bottleneck bandwidth wise and are so on
> most of my storage nodes.
> Because you'd need at least 2 400GB DC S3710 SSDs to get around 1GB/s
> writes, or one link worth.

I plan to use 1 or 2 SSD journal (probably, 1 SSD every 6 spinning disks)

> Splitting things in cluster and public networks ONLY makes sense when your
> storage node can saturate ALL the network bandwidth, which usually is only
> the case when it comes to very expensive SSD/NVMe only nodes.

This is not my case.

> Going back to your original post, with a split network the latency in both
> networks counts the same, as a client write will NOT be acknowledged until
> it has reach the journal of all replicas, so having a higher latency
> cluster network is counterproductive.

Ok.

> Or if you can start with a clean slate (including the clients), look at
> Infiniband.
> All my production clusters are running entirely IB (IPoIB currently) and
> I'm very happy with the performance, latency and cost.

Yes, i'll start with a brand new network.
Acutally i'm testing with some old IB switches (DDR) and i'm not very
happy, as IPoIB doesn't go over 8/9Gbit/s in a DDR. Additionally, CX4
cables used by DDR are... HUGE and very "hard" to bend in the rack.
I don't know if QDR cables are thinner.

Are you using QDR? I've seen a couple of mellanox used switches on ebay
that seems to be ok for me. 36 QDR ports would be awesome but I don't
have any IB knowledge.
Could I keep the IB fabric unconfigured and use only IPoIB ?
I can create a bonded (failover) IPoIB device on each node and add 2 or more
IB cables between both switches. In a normal Ethernet network, these 2 cables
must be joined in a LAG to avoid loops. Is infiniband able to manage
this on their
own ? I've never find a way to aggragate multiple ports.

The real drawback with IB is that I have to add IB cards on each compute nodes,
where my current compute nodes a 2 10GBaseT ports onboard.

This add some costs....
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux