Re: Ceph network topology with redundant switches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/18/2013 09:39 PM, Tim Bishop wrote:
Hi all,

I'm investigating and planning a new Ceph cluster starting with 6
nodes with currently planned growth to 12 nodes over a few years. Each
node will probably contain 4 OSDs, maybe 6.

The area I'm currently investigating is how to configure the
networking. To avoid a SPOF I'd like to have redundant switches for
both the public network and the internal network, most likely running
at 10Gb. I'm considering splitting the nodes in to two separate racks
and connecting each half to its own switch, and then trunk the
switches together to allow the two halves of the cluster to see each
other. The idea being that if a single switch fails I'd only lose half
of the cluster.


Why not three switches in total and use VLANs on the switches to separate public/cluster traffic?

This way you can configure the CRUSH map to have one replica go to each "switch" so that when you loose a switch you still have two replicas available.

Saves you a lot of switches and makes the network simpler.

(I'm not touching on the required third MON in a separate location and
the CRUSH rules to make sure data is correctly replicated - I'm happy
with the setup there)

To allow consumers of Ceph to see the full cluster they'd be directly
connected to both switches. I could have another layer of switches for
them and interlinks between them, but I'm not sure it's worth it on
this sort of scale.

My question is about configuring the public network. If it's all one
subnet then the clients consuming the Ceph resources can't have both
links active, so they'd be configured in an active/standby role. But
this results in quite heavy usage of the trunk between the two
switches when a client accesses nodes on the other switch than the one
they're actively connected to.


Why can't the clients have both links active? You could use LACP? Some switches support mlag to span LACP trunks over two switches.

Or use some intelligent bonding mode in the Linux kernel.

So, can I configure multiple public networks? I think so, based on the
documentation, but I'm not completely sure. Can I have one half of the
cluster on one subnet, and the other half on another? And then the
client machine can have interfaces in different subnets and "do the
right thing" with both interfaces to talk to all the nodes. This seems
like a fairly simple solution that avoids a SPOF in Ceph or the network
layer.

There is no restriction on the IPs of the OSDs. All they need is a Layer 3 route to the WHOLE cluster and monitors.

Say doesn't have to be in a Layer 2 network, everything can be simply Layer 3. You just have to make sure all the nodes can reach each other.


Or maybe I'm missing an alternative that would be better? I'm aiming
for something that keeps things as simple as possible while meeting
the redundancy requirements.

              client
                |
                |
          core switch
       /        |     \
      /         |      \
     /          |       \
    /           |        \
   /            |         \
switch1      switch2     switch3
   |            |          |
  OSD          OSD       OSD


You could build something like that. That would be fairly simple.

Keep in mind that you can always loose a switch and still keep I/O going.

Wido

As an aside, there's a similar issue on the cluster network side with
heavy traffic on the trunk between the two cluster switches. But I
can't see that's avoidable, and presumably it's something people just
have to deal with in larger Ceph installations?

Finally, this is all theoretical planning to try and avoid designing
in bottlenecks at the outset. I don't have any concrete ideas of
loading so in practice none of it may be an issue.

Thanks for your thoughts.

Tim.



--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux