Re: Public and Private network over 1 interface

Christian Balzer <chibi@xxxxxxx> · Tue, 24 May 2016 11:27:04 +0900

Hello,

separate public/private networks make sense for clusters that:

a) have vastly more storage bandwidth than a single link can handle or
b) are extremely read heavy on the client side, so replication reads can
be separated from the client ones.

I posit that neither is the case in your scenario.

Your 2 NVMes will give you a best 2GB/s writes, so that's the best you can
ever hope for in the write department, half of your 40Gb/s speed.

Also your HDDs are unlikely to write much more than 2GB/s combined with
filestore. 
Bluestore may be another story, for another time in a year or so at the
earliest.

As for reads, if your cluster would be doing nothing but reads your
configuration might fill up your 40Gb/s link. 
Alas any and all writes, especially random I/Os and even more so sync ones
will reduce that significantly, as you already guessed at yourself. 

So this all being said, I'd go for a single network with redundant links
(switches) and be done with it.

Regards,

Christian

On Mon, 23 May 2016 17:21:27 -0500 Brady Deetz wrote:

> To be clear for future responders, separate mds and mon servers are in
> the design. Everything is the same as the osd hardware except the
> chassis and there aren't 24 hdds in there.
> On May 23, 2016 4:27 PM, "Oliver Dzombic" <info@xxxxxxxxxxxxxxxxx> wrote:
> 
> > Hi,
> >
> > keep it simple, would be in my opinion devide different tasks to
> > different servers and networks.
> >
> > The more stuff is running on one device, the higher is the chance that
> > they will influence each other and this way make debugging harder.
> >
> > Our first setup's had all ( mon, osd, mds ) on one server. Ending up
> > that its hardware to debug. Because you dont know, if its caused by the
> > mon, or maybe the kernel, or maybe just because of a combination of
> > kernel + osd, the mds is kernel dumping ?!
> >
> > Same with the network. If you have big numbers of stuff flowing there,
> > you have big numbers to keep an eye on, with cross-side effects, which
> > will not be helpful to debug stuff fastly.
> >
> > So, if you want to keep stuff simply, make one device for one task.
> >
> > Of course, there is a natural balance between efficiency and deviding
> > like that. I would also not ( like to ) buy a new switch for big money,
> > just because i run out of ports ( while i have still so much bandwidth
> > on it ).
> >
> > But on the other hand, in my humble opinion, the factor of how easy it
> > is to debug and how big the chance of cross side effects is, is a big,
> > considerable factor.
> >
> > --
> > Mit freundlichen Gruessen / Best regards
> >
> > Oliver Dzombic
> > IP-Interactive
> >
> > mailto:info@xxxxxxxxxxxxxxxxx
> >
> > Anschrift:
> >
> > IP Interactive UG ( haftungsbeschraenkt )
> > Zum Sonnenberg 1-3
> > 63571 Gelnhausen
> >
> > HRB 93402 beim Amtsgericht Hanau
> > Geschäftsführung: Oliver Dzombic
> >
> > Steuer Nr.: 35 236 3622 1
> > UST ID: DE274086107
> >
> >
> > Am 23.05.2016 um 23:19 schrieb Wido den Hollander:
> > >
> > >> Op 23 mei 2016 om 21:53 schreef Brady Deetz <bdeetz@xxxxxxxxx>:
> > >>
> > >>
> > >> TLDR;
> > >> Has anybody deployed a Ceph cluster using a single 40 gig nic? This
> > >> is discouraged in
> > >>
> > http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/
> > >>
> > >> "One NIC OSD in a Two Network Cluster:
> > >> Generally, we do not recommend deploying an OSD host with a single
> > >> NIC
> > in a
> > >> cluster with two networks. --- [cut] --- Additionally, the public
> > network
> > >> and cluster network must be able to route traffic to each other,
> > >> which
> > we
> > >> don’t recommend for security reasons."
> > >>
> > >
> > > I still don't agree with that part of the docs.
> > >
> > > Imho the public and cluster network make 99% of the setups more
> > > complex.
> > This make it harder to diagnose problems while a simple, single, flat
> > network is easier to work with.
> > >
> > > I like the approach of a single machine with a single NIC. That
> > > machine
> > will be up or down. Not in a state where one of the networks might be
> > failing.
> > >
> > > Keep it simple is my advice.
> > >
> > > Wido
> > >
> > >> --------------------------------------------------------
> > >> Reason for this question:
> > >> My hope is that I can keep capital expenses down for this year then
> > >> add
> > a
> > >> second switch and second 40 gig DAC to each node next year.
> > >>
> > >> Thanks for any wisdom you can provide.
> > >> ---------------------------------------------------------
> > >>
> > >> Details:
> > >> Planned configuration - 40 gig interconnect via Brocade VDX 6940
> > >> and 8x
> > OSD
> > >> nodes configured as follows:
> > >> 2x E5-2660v4
> > >> 8x 16GB ECC DDR4 (128 GB RAM)
> > >> 1x dual port Mellanox ConnectX-3 Pro EN
> > >> 24x 6TB enterprise sata
> > >> 2x P3700 400GB pcie nvme (journals)
> > >> 2x 200GB SSD (OS drive)
> > >>
> > >> 1) From a security perspective, why not keep the networks segmented
> > >> all
> > the
> > >> way to the node using tagged VLANs or VXLANs then untag them at the
> > node?
> > >> From a security perspective, that's no different than sending 2
> > networks to
> > >> the same host on different interfaces.
> > >>
> > >> 2) By using VLANs, I wouldn't have to worry about the special
> > configuration
> > >> of Ceph mentioned in referenced documentation, since the untagged
> > >> VLANs would show up as individual interfaces on the host.
> > >>
> > >> 3) From a performance perspective, has anybody observed a
> > >> significant performance hit by untagging vlans on the node? This is
> > >> something I
> > can't
> > >> test, since I don't currently own 40 gig gear.
> > >>
> > >> 3.a) If I used a VXLAN offloading nic, wouldn't this remove this
> > potential
> > >> issue?
> > >>
> > >> 3.a) My back of napkin estimate shows that total OSD read
> > >> throughput per node could max out around 38gbps (4800MB/s). But in
> > >> reality, with
> > plenty of
> > >> random I/O, I'm expecting to see something more around 30gbps. So a
> > single
> > >> 40 gig connection ought to leave plenty of headroom. right?
> > >> _______________________________________________
> > >> ceph-users mailing list
> > >> ceph-users@xxxxxxxxxxxxxx
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com