Liu Yuan napsal(a): > On Thu, Aug 08, 2013 at 02:23:11AM +0200, Valerio Pachera wrote: >> Auch! This time things went bad: >> cluster has stopped. ... >> >> [note: 2] >> Aug 7 21:00:07 sheepdog004 corosync[4365]: [TOTEM ] Retransmit >> List: 757f 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 758a 758b >> 758c 758d 758e 758f 7590 7591 7592 >> Aug 7 21:00:07 sheepdog004 corosync[4365]: [TOTEM ] Retransmit >> List: 757f 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 758a 758b >> 758c 758d 758e 758f 7590 7591 7592 7593 7594 7595 7596 7597 7598 7599 >> 759a 759b 759c >> ... > > Hello corosync guys, is this normal? Sheep daemon detected a network partition. > Few retransmits of packets is pretty normal because of UDP. On the other hand, what you've sent doesn't look normal. It is every time related to networking issue. So: - how often you get this messages? - Every time node starts? - Then problem is with multicast/switch/firewall. Just make sure multicast works (you can use omping for that). - After ~two minutes of running? - Maybe known problem in kernel multicast https://bugzilla.redhat.com/show_bug.cgi?id=880035 - Isn't there any big IO/CPU load causing corosync to not to be scheduled properly? Regards, Honza _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss