Quoting Lars Täuber (taeuber@xxxxxxx): > > > This is something i was told to do, because a reconstruction of failed > > > OSDs/disks would have a heavy impact on the backend network. > > > > Opinions vary on running "public" only versus "public" / "backend". > > Having a separate "backend" network might lead to difficult to debug > > issues when the "public" network is working fine, but the "backend" is > > having issues and OSDs can't peer with each other, while the clients can > > talk to all OSDs. You will get slow requests and OSDs marking each other > > down while they are still running etc. > > This I was not aware of. It's real. I've been bitten by this several times in a PoC cluster while playing around with networking ... make sure you have proper monitoring checks on all network interfaces when running this setup. > > In your case with only 6 spinners max per server there is no way you > > will every fill the network capacity of a 25 Gb/s network: 6 * 250 MB/s > > (for large spinners) should be just enough to fill a 10 Gb/s link. A > > redundant 25 Gb/s link would provide 50 Gb/s of bandwith, enough for > > both OSD replication traffic and client IO. > > The reason for the choice for the 25GBit network was because a remark > of someone, that the latency in this ethernet is way below that of > 10GBit. I never double checked this. This is probably true. 25 Gb/s is a single-lane (SerDes) which is used in 50 Gb/s / 100 Gb/s 200 Gb/s connections. It operates on ~ 2.5 times the clock rate of 10 Gb/s / 40 Gb/s. But for clients to fully benefit from this lower latency, they should be on 25 Gb/s as well. If you can affort to redesign your cluster (and low latency is important) ... Then again ... the latency your spinners introduce is a few orders of magnitude higher than the network latency ... I would then (also) invest in NVMe drives for (at least) metadata ... and switch to 3 x replication ... but that might be too much asked for. TL;DR: when desinging clusters, try to think about the "weakest" link (bottleneck) ... most probably this will be disk speed / Ceph overhead. Gr. Stefan -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com