Re: Gluster and bonding

Martin Toth <snowmailer@xxxxxxxxx> · Mon, 25 Feb 2019 14:22:05 +0100

How long does it take to your devices (using mode 5 or 6, ALB is prefered for GlusterFS) to take-over the MAC? This can result in your error -  "transport endpoint is not connected” - there are some timeouts within gluster set by default.I am using LACP and it works without any problem. Can you share your mode 5 / 6 configuration ?

Thanks.
Martin

On 25 Feb 2019, at 13:44, Jorick Astrego <jorick@xxxxxxxxxxx> wrote:

  Hi,
Well no, mode 5 and mode 6 also have fault tollerance and don't
      need any special switch config.
Quick google search:
https://serverfault.com/questions/734246/does-balance-alb-and-balance-tlb-support-fault-tolerance

    Bonding Mode 5 (balance-tlb) works by looking at all the
        devices in the bond, and sending out the slave with the least
        current traffic load. Traffic is only received by one slave (the
        "primary slave"). If a slave is lost, that slave is not
        considered for transmission, so this mode is fault-tolerant.
Bonding Mode 6 (balance-alb) works as above, except incoming
        ARP requests are intercepted by the bonding driver, and the
        bonding driver generates ARP replies so that external hosts are
        tricked into sending their traffic into one of the other bonding
        slaves instead of the primary slave. If many hosts in the same
        broadcast domain contact the bond, then traffic should balance
        roughly evenly into all slaves.
If a slave is lost in Mode 6, then it may take some time for a
        remote host to time out its ARP table entry and send a new ARP
        request. A TCP or SCTP retransmission tents to lead into ARP
        request fairly quickly, but a UDP datagram does not, and will
        rely on the usual ARP table refresh. So Mode 6 is
        fault tolerant, but convergence on slave loss may take some time
        depending on the Layer 4 protocol used.
If you are worried about fast fault tolerance, then consider
        using Mode 4 (802.3ad aka LACP) which negotiates link
        aggregation between the bond and the switch, and constantly
        updates the link status between the aggregation partners. Mode 4
        also has configurable load balance hashing so is better for
        in-order delivery of TCP streams compared to Mode 5 or Mode 6.

https://wiki.linuxfoundation.org/networking/bonding

          balance-tlb or 5

          Adaptive transmit load balancing: channel bonding that does
          not require any special switch support. The outgoing traffic
          is distributed according to the current load (computed
          relative to the speed) on each slave. Incoming traffic is
          received by the current slave. If the receiving slave
            fails, another slave takes over the MAC address of the
            failed receiving slave.

             Prerequisite:

                 Ethtool support in the base drivers for
                  retrieving the speed of each slave.

          balance-alb or 6 

          Adaptive load balancing: includes balance-tlb plus receive
            load balancing (rlb) for IPV4 traffic, and does not
          require any special switch support. The receive load balancing
          is achieved by ARP negotiation.

             The bonding driver intercepts the ARP
              Replies sent by the local system on their way out and
              overwrites the source hardware address with the unique
              hardware address of one of the slaves in the bond such
              that different peers use different hardware addresses for
              the server.

             Receive traffic from connections created by
              the server is also balanced. When the local system sends
              an ARP Request the bonding driver copies and saves the
              peer's IP information from the ARP packet.

             When the ARP Reply arrives from the peer,
              its hardware address is retrieved and the bonding driver
              initiates an ARP reply to this peer assigning it to one of
              the slaves in the bond.

             A problematic outcome of using ARP
              negotiation for balancing is that each time that an ARP
              request is broadcast it uses the hardware address of the
              bond. Hence, peers learn the hardware address of the bond
              and the balancing of receive traffic collapses to the
              current slave. This is handled by sending updates (ARP
              Replies) to all the peers with their individually assigned
              hardware address such that the traffic is redistributed.
              Receive traffic is also redistributed when a new slave is
              added to the bond and when an inactive slave is
              re-activated. The receive load is distributed sequentially
              (round robin) among the group of highest speed slaves in
              the bond.

             When a link is reconnected or a new slave
              joins the bond the receive traffic is redistributed among
              all active slaves in the bond by initiating ARP Replies
              with the selected mac address to each of the clients. The
              updelay parameter (detailed below) must be set to a value
              equal or greater than the switch's forwarding delay so
              that the ARP Replies sent to the peers will not be blocked
              by the switch.

    On 2/25/19 1:16 PM, Martin Toth wrote:

      Hi Alex,

      you have to use bond mode 4 (LACP - 802.3ad) in
        order to achieve redundancy of cables/ports/switches. I suppose
        this is what you want.

      BR,
      Martin

              On 25 Feb 2019, at 11:43, Alex K <rightkicktech@xxxxxxxxx>
                wrote:

                  Hi All, 

                  I was asking if it is possible to have
                    the two separate cables connected to two different
                    physical switched. When trying mode6 or mode1 in
                    this setup gluster was refusing to start the
                    volumes, giving me "transport endpoint is not
                    connected". 

                  server1: cable1 ---------------- switch1
                    --------------------- server2: cable1

                       |

                  server1: cable2 ---------------- switch2
                    --------------------- server2: cable2

                  Both switches are connected with each
                    other also. This is done to achieve redundancy for
                    the switches. 

                  When disconnecting cable2 from both
                    servers, then gluster was happy. 

                  What could be the problem?

                  Thanx,
                  Alex

                  On Mon, Feb 25, 2019
                    at 11:32 AM Jorick Astrego <jorick@xxxxxxxxxxx>
                    wrote:

                    Hi,
We use bonding mode 6 (balance-alb)
                        for GlusterFS traffic
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/network4

                      Preferred bonding mode for Red Hat
                          Gluster Storage client is mode 6
                          (balance-alb), this allows client to transmit
                          writes in parallel on separate NICs much of
                          the time. 

Regards,
Jorick Astrego

                      On
                        2/25/19 5:41 AM, Dmitry Melekhov wrote:

                        23.02.2019
                          19:54, Alex K пишет:

                            Hi all, 

                            I have a replica 3 setup where
                              each server was configured with a dual
                              interfaces in mode 6 bonding. All cables
                              were connected to one common network
                              switch. 

                            To add redundancy to the
                              switch, and avoid being a single point of
                              failure, I connected each second cable of
                              each server to a second switch. This
                              turned out to not function as gluster was
                              refusing to start the volume logging
                              "transport endpoint is disconnected"
                              although all nodes were able to reach each
                              other (ping) in the storage network. I
                              switched the mode to mode 1
                              (active/passive) and initially it worked
                              but following a reboot of all cluster same
                              issue appeared. Gluster is not starting
                              the volumes. 

                            Isn't active/passive supposed
                              to work like that? Can one have such
                              redundant network setup or are there any
                              other recommended approaches?

Yes, we use lacp, I guess this is
                          mode 4 ( we use teamd ), it is, no doubt, best
                          way.

                            Thanx, 

                            Alex

                          _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

                        _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

                      Met
                            vriendelijke groet, With kind regards,

                            Jorick Astrego

                        Netbulae Virtualization Experts 

                            Tel: 053 20 30 270
                            info@xxxxxxxxxxx
                            Staalsteden 4-3A
                            KvK 08198180

                            Fax: 053 20 30 271
                            www.netbulae.eu
                            7547 TA Enschede
                            BTW NL821234584B01

                    _______________________________________________

                    Gluster-users mailing list

                    Gluster-users@xxxxxxxxxxx

                    https://lists.gluster.org/mailman/listinfo/gluster-users

                _______________________________________________

                Gluster-users mailing list

                Gluster-users@xxxxxxxxxxx

                https://lists.gluster.org/mailman/listinfo/gluster-users

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

Met vriendelijke groet, With kind regards,

Jorick Astrego

Netbulae Virtualization Experts 
Tel:  053 20 30 270     info@xxxxxxxxxxx     Staalsteden 4-3A     KvK 08198180
    Fax: 053 20 30 271     www.netbulae.eu     7547 TA Enschede     BTW NL821234584B01

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users