On Sat, Nov 10, 2012 at 9:52 AM, Digimer <lists@xxxxxxxxxx> wrote: > On 11/09/2012 11:12 PM, Zama Ques wrote: >>> Need help on resolving a issue related to implementing High Availability at network level . I understand that this is not the right forum to ask this question , but since it is related to HA and Linux , I am asking here and I feel somebody here will have answer to the issues I am facing . >>> >>> I am trying to implement Ethernet Bonding , Both the interface in my server are connected to two different network switches . >>> >>> My configuration is as follows: >>> >>> ======== >>> # cat /proc/net/bonding/bond0 >>> >>> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009) >>> >>> Bonding Mode: adaptive load balancing Primary Slave: None Currently >>> Active Slave: eth0 MII Status: up MII Polling Interval (ms): 0 Up Delay >>> (ms): 0 Down Delay (ms): 0 >>> >>> Slave Interface: eth0 MII Status: up Speed: 1000 Mbps Duplex: full Link >>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:10 Slave queue ID: 0 >>> >>> Slave Interface: eth1 MII Status: up Speed: 1000 Mbps Duplex: full Link >>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:14 Slave queue ID: 0 >>> ------------ >>> # cat /sys/class/net/bond0/bonding/mode >>> >>> balance-alb 6 >>> >>> >>> # cat /sys/class/net/bond0/bonding/miimon >>> 0 >>> >>> ============ >>> >>> >>> The issue for me is that I am seeing packet loss after configuring bonding . Tried connecting both the interface to the same switch , but still seeing the packet loss . Also , tried changing miimon value to 100 , but still seeing the packet loss. >>> >>> What I am missing in the configuration ? Any help will be highly appreciated in resolving the problem . >>> >>> >>> >>> Thanks >>> Zaman >> >> > You didn't share any details on your configuration, but I will assume >>> you are using corosync. >> >>> The only supported bonding mode is Active/Passive (mode=1). I've >>> personally tried all modes, out of curiosity, and all had problems. The >>> short of it is that if you need more that 1 gbit of performance, buy >>> faster cards. >> >>> If you are interested in what I use, it's documented here: >> >>> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Network >> >>> I've used this setup in several production clusters and have tested >>> failure are recovery extensively. It's proven very stable. :) >> >> >> Thanks Digimer for the quick response and pointing me to the link . I am yet to reach cluster configuration , initially trying to understand ethernet bonding before going into cluster configuration. So , option for me is only to use Active/Passive bonding mode in case of clustered environment. >> Few more clarifications needed , Can we use other bonding modes in non clustered environment . I am seeing packet loss in other modes . Also , the support of using only mode=1 in cluster environment is it a restriction of RHEL Cluster suite or it is by design . >> >> Will be great if you clarify these queries . >> >> Thanks in Advance >> Zaman > > Corosync is the only actively developed/supported (HA) cluster > communications and membership tool. It's used on all modern distros for > clustering and the requirement for mode=1 is with it. As such, it > doesn't matter which OS you are on, it's the only mode that will work > (reliably). > > The problem is that corosync needs to detect state changes quickly. It > does this using the totem protocol (which serves other purposes), which > passes a token around the nodes in the cluster. If a node is sent a > token and the token is not returned within a time-out period, it is > declared lost and a new token is dispatched. Once too many failures > occur in a row, the node is declared lost and it is ejected from the > cluster. This process is detailed in the link above under the "Concept; > Fencing" section. > > With all modes other than mode=1, the failure recovery and/or the > restoration of a link in the bond causes a sufficient disruption to > cause a node to be declared lost. As I mentioned, this matches my > experience in testing the other modes. It isn't an arbitrary rule. > > As for non-clustered traffic; the usefulness of other bond modes depends > entirely on the traffic you are pushing over it. Personally, I am > focused on HA in clusters, so I only use mode=1, regardless of the > traffic designed for it. > > digimer I was dealing with an issue where network performance had to be improved in a high availability cluster and while going through the archives I saw this thread. Would this condition of bonding mode being 1 (or active backup) also apply when we have different interfaces for cluster communication and service networks ? In such a scenario, can't we have the bonding mode for the cluster communication network interfaces as 1 and the bonding mode for the interfaces on service network as 0 or 5 (or any other suitable mode) ? Thanks, -- Manish -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster