On 01/05/2013 02:23 AM, Manish Kathuria wrote: > On Sat, Nov 10, 2012 at 9:52 AM, Digimer <lists@xxxxxxxxxx> wrote: >> On 11/09/2012 11:12 PM, Zama Ques wrote: > >>>> Need help on resolving a issue related to implementing High Availability at network level . I understand that this is not the right forum to ask this question , but since it is related to HA and Linux , I am asking here and I feel somebody here will have answer to the issues I am facing . >>>> >>>> I am trying to implement Ethernet Bonding , Both the interface in my server are connected to two different network switches . >>>> >>>> My configuration is as follows: >>>> >>>> ======== >>>> # cat /proc/net/bonding/bond0 >>>> >>>> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009) >>>> >>>> Bonding Mode: adaptive load balancing Primary Slave: None Currently >>>> Active Slave: eth0 MII Status: up MII Polling Interval (ms): 0 Up Delay >>>> (ms): 0 Down Delay (ms): 0 >>>> >>>> Slave Interface: eth0 MII Status: up Speed: 1000 Mbps Duplex: full Link >>>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:10 Slave queue ID: 0 >>>> >>>> Slave Interface: eth1 MII Status: up Speed: 1000 Mbps Duplex: full Link >>>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:14 Slave queue ID: 0 >>>> ------------ >>>> # cat /sys/class/net/bond0/bonding/mode >>>> >>>> balance-alb 6 >>>> >>>> >>>> # cat /sys/class/net/bond0/bonding/miimon >>>> 0 >>>> >>>> ============ >>>> >>>> >>>> The issue for me is that I am seeing packet loss after configuring bonding . Tried connecting both the interface to the same switch , but still seeing the packet loss . Also , tried changing miimon value to 100 , but still seeing the packet loss. >>>> >>>> What I am missing in the configuration ? Any help will be highly appreciated in resolving the problem . >>>> >>>> >>>> >>>> Thanks >>>> Zaman >>> >>> > You didn't share any details on your configuration, but I will assume >>>> you are using corosync. >>> >>>> The only supported bonding mode is Active/Passive (mode=1). I've >>>> personally tried all modes, out of curiosity, and all had problems. The >>>> short of it is that if you need more that 1 gbit of performance, buy >>>> faster cards. >>> >>>> If you are interested in what I use, it's documented here: >>> >>>> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Network >>> >>>> I've used this setup in several production clusters and have tested >>>> failure are recovery extensively. It's proven very stable. :) >>> >>> >>> Thanks Digimer for the quick response and pointing me to the link . I am yet to reach cluster configuration , initially trying to understand ethernet bonding before going into cluster configuration. So , option for me is only to use Active/Passive bonding mode in case of clustered environment. >>> Few more clarifications needed , Can we use other bonding modes in non clustered environment . I am seeing packet loss in other modes . Also , the support of using only mode=1 in cluster environment is it a restriction of RHEL Cluster suite or it is by design . >>> >>> Will be great if you clarify these queries . >>> >>> Thanks in Advance >>> Zaman >> >> Corosync is the only actively developed/supported (HA) cluster >> communications and membership tool. It's used on all modern distros for >> clustering and the requirement for mode=1 is with it. As such, it >> doesn't matter which OS you are on, it's the only mode that will work >> (reliably). >> >> The problem is that corosync needs to detect state changes quickly. It >> does this using the totem protocol (which serves other purposes), which >> passes a token around the nodes in the cluster. If a node is sent a >> token and the token is not returned within a time-out period, it is >> declared lost and a new token is dispatched. Once too many failures >> occur in a row, the node is declared lost and it is ejected from the >> cluster. This process is detailed in the link above under the "Concept; >> Fencing" section. >> >> With all modes other than mode=1, the failure recovery and/or the >> restoration of a link in the bond causes a sufficient disruption to >> cause a node to be declared lost. As I mentioned, this matches my >> experience in testing the other modes. It isn't an arbitrary rule. >> >> As for non-clustered traffic; the usefulness of other bond modes depends >> entirely on the traffic you are pushing over it. Personally, I am >> focused on HA in clusters, so I only use mode=1, regardless of the >> traffic designed for it. >> >> digimer > > I was dealing with an issue where network performance had to be > improved in a high availability cluster and while going through the > archives I saw this thread. > > Would this condition of bonding mode being 1 (or active backup) also > apply when we have different interfaces for cluster communication and > service networks ? In such a scenario, can't we have the bonding mode > for the cluster communication network interfaces as 1 and the bonding > mode for the interfaces on service network as 0 or 5 (or any other > suitable mode) ? > > Thanks, > -- > Manish That should be fine. Note though that if you use your other network as a backup totem ring, and for some reason corosync fails over to that ring, it will fail back again if a member in the non-mode=1 bond hiccups or fails. I've not tested this though, of course, so there might be a gotcha I don't know about. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster