Re: Gluster and bonding

Jorick Astrego <jorick@xxxxxxxxxxx> · Mon, 25 Feb 2019 15:24:09 +0100

    Hi,
    Have not measured it as we have been running this way for years
      now and haven't experienced any problems with "transport endpoint
      is not connected” with this setup.
    We used the default options "BONDING_OPTS='mode=6 miimon=100'"

         miimon=time_in_milliseconds  

           Specifies (in milliseconds) how often MII
            link monitoring occurs. This is useful if high availability
            is required because MII is used to verify that the NIC is
            active. 

    On 2/25/19 2:22 PM, Martin Toth wrote:

      How long does it take to your devices (using mode 5 or 6, ALB is
      prefered for GlusterFS) to take-over the MAC? This can result in
      your error -  "transport endpoint is not connected” - there are
      some timeouts within gluster set by default.
      I am using LACP and it works without any problem.
        Can you share your mode 5 / 6 configuration ?

      Thanks.
      Martin

              On 25 Feb 2019, at 13:44, Jorick Astrego
                <jorick@xxxxxxxxxxx>
                wrote:

                  Hi,
                  Well no, mode 5 and mode 6 also have fault
                    tollerance and don't need any special switch config.
                  Quick google search:
                  https://serverfault.com/questions/734246/does-balance-alb-and-balance-tlb-support-fault-tolerance

                    Bonding Mode 5 (balance-tlb) works by
                      looking at all the devices in the bond, and
                      sending out the slave with the least current
                      traffic load. Traffic is only received by one
                      slave (the "primary slave"). If a slave is lost,
                      that slave is not considered for transmission, so
                      this mode is fault-tolerant.
                    Bonding Mode 6 (balance-alb) works as
                      above, except incoming ARP requests are
                      intercepted by the bonding driver, and the bonding
                      driver generates ARP replies so that external
                      hosts are tricked into sending their traffic into
                      one of the other bonding slaves instead of the
                      primary slave. If many hosts in the same broadcast
                      domain contact the bond, then traffic should
                      balance roughly evenly into all slaves.
                    If a slave is lost in Mode 6, then it
                      may take some time for a remote host to time out
                      its ARP table entry and send a new ARP request. A
                      TCP or SCTP retransmission tents to lead into ARP
                      request fairly quickly, but a UDP datagram does
                      not, and will rely on the usual ARP table refresh.
                      So Mode 6 is fault tolerant,
                      but convergence on slave loss may take some time
                      depending on the Layer 4 protocol used.
                    If you are worried about fast fault
                      tolerance, then consider using Mode 4 (802.3ad aka
                      LACP) which negotiates link aggregation between
                      the bond and the switch, and constantly updates
                      the link status between the aggregation partners.
                      Mode 4 also has configurable load balance hashing
                      so is better for in-order delivery of TCP streams
                      compared to Mode 5 or Mode 6.

                  https://wiki.linuxfoundation.org/networking/bonding

                        balance-tlb or
                          5

                        Adaptive transmit load balancing: channel
                        bonding that does not require any special switch
                        support. The outgoing traffic is distributed
                        according to the current load (computed relative
                        to the speed) on each slave. Incoming traffic is
                        received by the current slave. If
                          the receiving slave fails, another slave takes
                          over the MAC address of the failed receiving
                          slave.

                           Prerequisite:

                               Ethtool support in the
                                base drivers for retrieving the speed of
                                each slave.

                        balance-alb or
                          6 

                        Adaptive load balancing: includes
                          balance-tlb plus receive load balancing
                        (rlb) for IPV4 traffic, and does not require any
                        special switch support. The receive load
                        balancing is achieved by ARP negotiation.

                           The bonding driver intercepts
                            the ARP Replies sent by the local system on
                            their way out and overwrites the source
                            hardware address with the unique hardware
                            address of one of the slaves in the bond
                            such that different peers use different
                            hardware addresses for the server.

                           Receive traffic from
                            connections created by the server is also
                            balanced. When the local system sends an ARP
                            Request the bonding driver copies and saves
                            the peer's IP information from the ARP
                            packet.

                           When the ARP Reply arrives
                            from the peer, its hardware address is
                            retrieved and the bonding driver initiates
                            an ARP reply to this peer assigning it to
                            one of the slaves in the bond.

                           A problematic outcome of
                            using ARP negotiation for balancing is that
                            each time that an ARP request is broadcast
                            it uses the hardware address of the bond.
                            Hence, peers learn the hardware address of
                            the bond and the balancing of receive
                            traffic collapses to the current slave. This
                            is handled by sending updates (ARP Replies)
                            to all the peers with their individually
                            assigned hardware address such that the
                            traffic is redistributed. Receive traffic is
                            also redistributed when a new slave is added
                            to the bond and when an inactive slave is
                            re-activated. The receive load is
                            distributed sequentially (round robin) among
                            the group of highest speed slaves in the
                            bond.

                           When a link is reconnected or
                            a new slave joins the bond the receive
                            traffic is redistributed among all active
                            slaves in the bond by initiating ARP Replies
                            with the selected mac address to each of the
                            clients. The updelay parameter (detailed
                            below) must be set to a value equal or
                            greater than the switch's forwarding delay
                            so that the ARP Replies sent to the peers
                            will not be blocked by the switch.

                  On 2/25/19 1:16 PM,
                    Martin Toth wrote:

                    Hi Alex,

                    you have to use bond mode 4 (LACP -
                      802.3ad) in order to achieve redundancy of
                      cables/ports/switches. I suppose this is what you
                      want.

                    BR,
                    Martin

                            On 25 Feb 2019, at 11:43, Alex
                              K <rightkicktech@xxxxxxxxx>
                              wrote:

                                Hi All, 

                                I was asking if it is
                                  possible to have the two separate
                                  cables connected to two different
                                  physical switched. When trying mode6
                                  or mode1 in this setup gluster was
                                  refusing to start the volumes, giving
                                  me "transport endpoint is not
                                  connected". 

                                server1: cable1
                                  ---------------- switch1
                                  --------------------- server2: cable1

                                               |

                                server1: cable2
                                  ---------------- switch2
                                  --------------------- server2: cable2

                                Both switches are
                                  connected with each other also. This
                                  is done to achieve redundancy for the
                                  switches. 

                                When disconnecting cable2
                                  from both servers, then gluster was
                                  happy. 

                                What could be the problem?

                                Thanx,
                                Alex

                                On
                                  Mon, Feb 25, 2019 at 11:32 AM Jorick
                                  Astrego <jorick@xxxxxxxxxxx>
                                  wrote:

                                    Hi,
                                    We use bonding mode 6
                                      (balance-alb) for GlusterFS
                                      traffic
                                    https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/network4

                                      Preferred bonding mode
                                        for Red Hat Gluster Storage
                                        client is mode 6 (balance-alb),
                                        this allows client to transmit
                                        writes in parallel on separate
                                        NICs much of the time. 

                                    Regards,
                                    Jorick Astrego

                                    On
                                      2/25/19 5:41 AM, Dmitry Melekhov
                                      wrote:

                                      23.02.2019
                                        19:54, Alex K пишет:

                                          Hi all, 

                                          I have a replica
                                            3 setup where each server
                                            was configured with a dual
                                            interfaces in mode 6
                                            bonding. All cables were
                                            connected to one common
                                            network switch. 

                                          To add
                                            redundancy to the switch,
                                            and avoid being a single
                                            point of failure, I
                                            connected each second cable
                                            of each server to a second
                                            switch. This turned out to
                                            not function as gluster was
                                            refusing to start the volume
                                            logging "transport endpoint
                                            is disconnected" although
                                            all nodes were able to reach
                                            each other (ping) in the
                                            storage network. I switched
                                            the mode to mode 1
                                            (active/passive) and
                                            initially it worked but
                                            following a reboot of all
                                            cluster same issue appeared.
                                            Gluster is not starting the
                                            volumes. 

                                          Isn't
                                            active/passive supposed to
                                            work like that? Can one have
                                            such redundant network setup
                                            or are there any other
                                            recommended approaches?

                                      Yes, we use lacp, I
                                        guess this is mode 4 ( we use
                                        teamd ), it is, no doubt, best
                                        way.

                                          Thanx, 

                                          Alex

                                        _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

                                      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

                                    Met
                                          vriendelijke groet, With kind
                                          regards,

                                          Jorick Astrego

                                      Netbulae Virtualization Experts 

                                          Tel: 053 20 30 270
                                          info@xxxxxxxxxxx
                                          Staalsteden 4-3A
                                          KvK 08198180

                                          Fax: 053 20 30 271
                                          www.netbulae.eu
                                          7547 TA Enschede
                                          BTW NL821234584B01

_______________________________________________

                                  Gluster-users mailing list

                                  Gluster-users@xxxxxxxxxxx

                                  https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

                              Gluster-users mailing list

                              Gluster-users@xxxxxxxxxxx

                              https://lists.gluster.org/mailman/listinfo/gluster-users

                    _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

                  Met vriendelijke groet, With kind
                        regards,

                        Jorick Astrego

                    Netbulae Virtualization Experts 

                        Tel: 053 20 30 270
                        info@xxxxxxxxxxx
                        Staalsteden 4-3A
                        KvK 08198180

                        Fax: 053 20 30 271
                        www.netbulae.eu
                        7547 TA Enschede
                        BTW NL821234584B01

                _______________________________________________

                Gluster-users mailing list

                Gluster-users@xxxxxxxxxxx

                https://lists.gluster.org/mailman/listinfo/gluster-users

Met vriendelijke groet, With kind regards,

Jorick Astrego

Netbulae Virtualization Experts 
Tel:  053 20 30 270     info@xxxxxxxxxxx     Staalsteden 4-3A     KvK 08198180
    Fax: 053 20 30 271     www.netbulae.eu     7547 TA Enschede     BTW NL821234584B01

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users