On 01/09/2012 12:12 AM, SATHYA - IT wrote: > Hi, > > Thanks for your mail. I herewith attaching the bonding and eth configuration > files. And on the /var/log/messages during the fence operation we can get > the logs updated related to network only in the node which fences the other. What IPs do the node names resolve to? I'm assuming bond1, but I would like you to confirm. > Server 1 Bond1: (Heartbeat) I'm still not sure what you mean by heartbeat. Do you mean the channel corosync is using? > On the log messages, > > Jan 3 14:46:07 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is > Down > Jan 3 14:46:07 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is > Down This tells me both links dropped at the same time. These messages are coming from below the cluster though. > Jan 3 14:46:07 filesrv2 kernel: bonding: bond1: link status definitely down > for interface eth3, disabling it > Jan 3 14:46:07 filesrv2 kernel: bonding: bond1: now running without any > active interface ! > Jan 3 14:46:07 filesrv2 kernel: bonding: bond1: link status definitely down > for interface eth4, disabling it With both of the bond's NICs down, the bond itself is going to drop. > Jan 3 14:46:10 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is > Up, 1000 Mbps full duplex, receive & transmit flow control ON > Jan 3 14:46:10 filesrv2 kernel: bond1: link status definitely up for > interface eth3, 1000 Mbps full duplex. > Jan 3 14:46:10 filesrv2 kernel: bonding: bond1: making interface eth3 the > new active one. > Jan 3 14:46:10 filesrv2 kernel: bonding: bond1: first active interface up! > Jan 3 14:46:10 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is > Up, 1000 Mbps full duplex, receive & transmit flow control ON > Jan 3 14:46:10 filesrv2 kernel: bond1: link status definitely up for > interface eth4, 1000 Mbps full duplex. I don't see any messages about the cluster in here, which I assume you cropped out. In this case, it doesn't matter as the problem is well below the cluster, but in general, please provide more data, not less. You never know what might help. :) Anyway, you need to sort out what is happening here. Bad drivers? Bad card (assuming dual-port)? Something is taking the NICs down, as though they were actually unplugged. If you can run them through a switch, if might help isolate which node is causing the problems as then you would only see one node record "NIC Copper Link is Down" and can then focus on just that node. -- Digimer E-Mail: digimer@xxxxxxxxxxx Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster