corosync and network flow control

Dietmar Maurer <dietmar@xxxxxxxxxxx> · Thu, 9 Aug 2012 06:58:17 +0000

On a cluster with 10 nodes, corosync cpg_mcast_joined() starts to

simply returns CPG_ERR_TRY_AGAIN.

I can repeat cpg_mcast_joined()  (over a long time), but it never get back to normal behavior.

The node use bonding (eth0/eth1), and one network card failed several times:

# grep 'kernel: igb:' var/log/syslog*
var/log/syslog.2:Aug  6 21:26:32 pve01 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
var/log/syslog.2:Aug  6 21:27:00 pve01 kernel: igb: eth0 NIC Link is Down
var/log/syslog.2:Aug  6 21:27:02 pve01 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
var/log/syslog.2:Aug  6 21:28:02 pve01 kernel: igb: eth0 NIC Link is Down
var/log/syslog.2:Aug  6 21:28:08 pve01 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
var/log/syslog.2:Aug  6 21:29:11 pve01 kernel: igb: eth0 NIC Link is Down

But until now, no problem – corosync works as expected.

var/log/syslog.2:Aug  6 21:29:13 pve01 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX

Please note that Flow control changed to: RX/TX

After that, corosync is completely unusable. Should I turn off flow control?

Is that a known problem?

- Dietmar

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss