On a cluster with 10 nodes, corosync cpg_mcast_joined() starts to
simply returns CPG_ERR_TRY_AGAIN. I can repeat cpg_mcast_joined() (over a long time), but it never get back to normal behavior. The node use bonding (eth0/eth1), and one network card failed several times: # grep 'kernel: igb:' var/log/syslog* var/log/syslog.2:Aug 6 21:26:32 pve01 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None var/log/syslog.2:Aug 6 21:27:00 pve01 kernel: igb: eth0 NIC Link is Down var/log/syslog.2:Aug 6 21:27:02 pve01 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None var/log/syslog.2:Aug 6 21:28:02 pve01 kernel: igb: eth0 NIC Link is Down var/log/syslog.2:Aug 6 21:28:08 pve01 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None var/log/syslog.2:Aug 6 21:29:11 pve01 kernel: igb: eth0 NIC Link is Down But until now, no problem – corosync works as expected. var/log/syslog.2:Aug 6 21:29:13 pve01 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Please note that Flow control changed to: RX/TX After that, corosync is completely unusable. Should I turn off flow control? Is that a known problem? - Dietmar |
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss