Hello, separate public/private networks make sense for clusters that: a) have vastly more storage bandwidth than a single link can handle or b) are extremely read heavy on the client side, so replication reads can be separated from the client ones. I posit that neither is the case in your scenario. Your 2 NVMes will give you a best 2GB/s writes, so that's the best you can ever hope for in the write department, half of your 40Gb/s speed. Also your HDDs are unlikely to write much more than 2GB/s combined with filestore. Bluestore may be another story, for another time in a year or so at the earliest. As for reads, if your cluster would be doing nothing but reads your configuration might fill up your 40Gb/s link. Alas any and all writes, especially random I/Os and even more so sync ones will reduce that significantly, as you already guessed at yourself. So this all being said, I'd go for a single network with redundant links (switches) and be done with it. Regards, Christian On Mon, 23 May 2016 17:21:27 -0500 Brady Deetz wrote: > To be clear for future responders, separate mds and mon servers are in > the design. Everything is the same as the osd hardware except the > chassis and there aren't 24 hdds in there. > On May 23, 2016 4:27 PM, "Oliver Dzombic" <info@xxxxxxxxxxxxxxxxx> wrote: > > > Hi, > > > > keep it simple, would be in my opinion devide different tasks to > > different servers and networks. > > > > The more stuff is running on one device, the higher is the chance that > > they will influence each other and this way make debugging harder. > > > > Our first setup's had all ( mon, osd, mds ) on one server. Ending up > > that its hardware to debug. Because you dont know, if its caused by the > > mon, or maybe the kernel, or maybe just because of a combination of > > kernel + osd, the mds is kernel dumping ?! > > > > Same with the network. If you have big numbers of stuff flowing there, > > you have big numbers to keep an eye on, with cross-side effects, which > > will not be helpful to debug stuff fastly. > > > > So, if you want to keep stuff simply, make one device for one task. > > > > Of course, there is a natural balance between efficiency and deviding > > like that. I would also not ( like to ) buy a new switch for big money, > > just because i run out of ports ( while i have still so much bandwidth > > on it ). > > > > But on the other hand, in my humble opinion, the factor of how easy it > > is to debug and how big the chance of cross side effects is, is a big, > > considerable factor. > > > > -- > > Mit freundlichen Gruessen / Best regards > > > > Oliver Dzombic > > IP-Interactive > > > > mailto:info@xxxxxxxxxxxxxxxxx > > > > Anschrift: > > > > IP Interactive UG ( haftungsbeschraenkt ) > > Zum Sonnenberg 1-3 > > 63571 Gelnhausen > > > > HRB 93402 beim Amtsgericht Hanau > > Geschäftsführung: Oliver Dzombic > > > > Steuer Nr.: 35 236 3622 1 > > UST ID: DE274086107 > > > > > > Am 23.05.2016 um 23:19 schrieb Wido den Hollander: > > > > > >> Op 23 mei 2016 om 21:53 schreef Brady Deetz <bdeetz@xxxxxxxxx>: > > >> > > >> > > >> TLDR; > > >> Has anybody deployed a Ceph cluster using a single 40 gig nic? This > > >> is discouraged in > > >> > > http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/ > > >> > > >> "One NIC OSD in a Two Network Cluster: > > >> Generally, we do not recommend deploying an OSD host with a single > > >> NIC > > in a > > >> cluster with two networks. --- [cut] --- Additionally, the public > > network > > >> and cluster network must be able to route traffic to each other, > > >> which > > we > > >> don’t recommend for security reasons." > > >> > > > > > > I still don't agree with that part of the docs. > > > > > > Imho the public and cluster network make 99% of the setups more > > > complex. > > This make it harder to diagnose problems while a simple, single, flat > > network is easier to work with. > > > > > > I like the approach of a single machine with a single NIC. That > > > machine > > will be up or down. Not in a state where one of the networks might be > > failing. > > > > > > Keep it simple is my advice. > > > > > > Wido > > > > > >> -------------------------------------------------------- > > >> Reason for this question: > > >> My hope is that I can keep capital expenses down for this year then > > >> add > > a > > >> second switch and second 40 gig DAC to each node next year. > > >> > > >> Thanks for any wisdom you can provide. > > >> --------------------------------------------------------- > > >> > > >> Details: > > >> Planned configuration - 40 gig interconnect via Brocade VDX 6940 > > >> and 8x > > OSD > > >> nodes configured as follows: > > >> 2x E5-2660v4 > > >> 8x 16GB ECC DDR4 (128 GB RAM) > > >> 1x dual port Mellanox ConnectX-3 Pro EN > > >> 24x 6TB enterprise sata > > >> 2x P3700 400GB pcie nvme (journals) > > >> 2x 200GB SSD (OS drive) > > >> > > >> 1) From a security perspective, why not keep the networks segmented > > >> all > > the > > >> way to the node using tagged VLANs or VXLANs then untag them at the > > node? > > >> From a security perspective, that's no different than sending 2 > > networks to > > >> the same host on different interfaces. > > >> > > >> 2) By using VLANs, I wouldn't have to worry about the special > > configuration > > >> of Ceph mentioned in referenced documentation, since the untagged > > >> VLANs would show up as individual interfaces on the host. > > >> > > >> 3) From a performance perspective, has anybody observed a > > >> significant performance hit by untagging vlans on the node? This is > > >> something I > > can't > > >> test, since I don't currently own 40 gig gear. > > >> > > >> 3.a) If I used a VXLAN offloading nic, wouldn't this remove this > > potential > > >> issue? > > >> > > >> 3.a) My back of napkin estimate shows that total OSD read > > >> throughput per node could max out around 38gbps (4800MB/s). But in > > >> reality, with > > plenty of > > >> random I/O, I'm expecting to see something more around 30gbps. So a > > single > > >> 40 gig connection ought to leave plenty of headroom. right? > > >> _______________________________________________ > > >> ceph-users mailing list > > >> ceph-users@xxxxxxxxxxxxxx > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com