Re: Have 2 different public networks

Alex Moore <alex@xxxxxxxxxx> · Sat, 20 Dec 2014 19:55:39 +0000



    Thought I'd share details of my setup
      as I am effectively achieving this (ie making monitors accessible
      over multiple interfaces) with IP routing as follows:

      
      My Ceph hosts each have a /32 IP address on a loopback interface.
      And that is the IP address that all their Ceph daemons are bound
      to. In ceph.conf I do that by setting all the following values to
      the host's loopback IP: "mon addr" in the [mon.x] sections;
      "cluster addr" and "public addr" in the [mds.x] sections; and
      "cluster addr", "public addr", and "osd heartbeat addr" in the
      [osd.x] sections. Then I use IP routing to ensure that the hosts
      can all reach each other's loopback IPs, and that clients can
      reach them too, over the relevant networks. This also allows
      inter-host traffic to fail over to using alternate paths if the
      normal path is down for some reason.

      
      To put this in context, my cluster is just a small 3-node cluster,
      and I have a pair of layer 3 switches, with networking arranged
      like this:

      
      node1:

      
      has an 8 Gbps point-to-point routed link to node2 (and uses this
      link to communicate with node2 under normal circumstances)

      has an 8 Gbps point-to-point routed link to node3 (and uses this
      link to communicate with node3 under normal circumstances)

      has a 2 Gbps routed link to the primary layer 3 switch (clients
      which are not other nodes in the cluster use this link to
      communicate with node1 under normal circumstances, and traffic to
      nodes 2 and 3 will switch to using this if their respective
      point-to-point links go down)

      has a 1 Gbps routed link to the secondary layer 3 switch (this is
      not used under normal circumstances - only if the 2 Gbps link goes
      down)

      
      node2:

      
      has an 8 Gbps point-to-point routed link to node3 (and uses this
      link to communicate with node3 under normal circumstances)

      has an 8 Gbps point-to-point routed link to node1 (and uses this
      link to communicate with node1 under normal circumstances)

      has a 2 Gbps routed link to the primary layer 3 switch (clients
      which are not other nodes in the cluster use this link to
      communicate with node2 under normal circumstances, and traffic to
      nodes 3 and 1 will switch to using this if their respective
      point-to-point links go down)

      has a 1 Gbps routed link to the secondary layer 3 switch (this is
      not used under normal circumstances - only if the 2 Gbps link goes
      down)

      
      node3:

      
      has an 8 Gbps point-to-point routed link to node2 (and uses this
      link to communicate with node1 under normal circumstances)

      has an 8 Gbps point-to-point routed link to node3 (and uses this
      link to communicate with node2 under normal circumstances)

      has a 2 Gbps routed link to the primary layer 3 switch (clients
      which are not other nodes in the cluster use this link to
      communicate with node3 under normal circumstances, and traffic to
      nodes 1 and 2 will switch to using this if their respective
      point-to-point links go down)

      has a 1 Gbps routed link to the secondary layer 3 switch (this is
      not used under normal circumstances - only if the 2 Gbps link goes
      down)

      
      I have avoided using a proper routing protocol for this, as the
      failover still works automatically when links go down even with
      static routes. Although I do also have scripts running on the
      hosts that detect when a device at the other end of a link is not
      pingable even though the link is up, and dynamically
      removes/inserts the routes as necessary in such a situation. But
      adapting this approach to a larger cluster where point-to-point
      links between all hosts isn't viable might well warrant use of a
      routing protocol.

      
      The end result being that I have more control over where the
      different traffic goes, and it allows me to mess around with the
      networking without any effect on the cluster.

      
      Alex

      
      On 20/12/2014 5:23 AM, Craig Lewis wrote:

    
          On Fri, Dec 19, 2014 at 6:19 PM,
            Francois Lafont <flafdivers@xxxxxxx> wrote:
            

              So, indeed, I have to use routing *or* maybe create 2
              monitors

              by server like this:

              
              [mon.node1-public1]

                  host     = ceph-node1

                  mon addr = 10.0.1.1

              
              [mon.node1-public2]

                  host     = ceph-node1

                  mon addr = 10.0.2.1

              
              # etc...

              
              But, in this case, the working directories of
              mon.node1-public1

              and mon.node1-public2 will be in the same disk (I have no

              choice). Is it a problem? Are monitors big consumers of
              I/O disk?

              
            Interesting idea.  While you will have an even number
              of monitors, you'll still have an odd number of failure
              domains.  I'm not sure if it'll work though... make sure
              you test having the leader on both networks.  It might
              cause problems if the leader is on the 10.0.1.0/24
              network?
            

            Monitors can be big consumers of disk IO, if there is a
              lot of cluster activity.  Monitors records all of the
              cluster changes in LevelDB, and send copies to all of the
              daemons.  There have been posts to the ML about people
              running out of Disk IOps on the monitors, and the problems
              it causes.  The bigger the cluster, the more IOps.  As
              long as you monitor and alert on your monitor disk IOps, I
              don't think it would be a problem.  
          
        
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com