> On Thu, Apr 8, 2010 at 12:58 AM, Steven Dake <sdake@xxxxxxxxxx> wrote: > On Wed, 2010-04-07 at 18:52 +0800, Bernard Chew wrote: >> Hi all, >> >> I noticed "openais[XXXX]" [TOTEM] Retransmit List: XXXXX" repeated >> every few hours in /var/log/messages. What does the message mean and >> is it normal? Will this cause fencing to take place eventually? >> > This means your network environment dropped packets and totem is > recovering them. This is normal operation, and in future versions such > as corosync no notification is printed when recovery takes place. > > There is a bug, however, fixed in revision 2122 where if the last packet > in the order is lost, and no new packets are unlost after it, the > processor will enter a failed to receive state and trigger fencing. > > Regards > -steve >> Thank you in advance. >> >> Regards, >> Bernard Chew >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > Thank you for the reply Steve! The cluster was running fine until last week where 3 nodes restarted suddenly. I suspect fencing took place since all 3 servers restarted at the same time but I couldn't find any fence related entries in the log. I am guessing we hit the bug you mentioned? Will the log indicate fencing has taken place with regards to the bug you mentioned? Also I noticed the message "kernel: clustat[28328]: segfault at 0000000000000024 rip 0000003b31c75bc0 rsp 00007fff955cb098 error 4" occasionally; is this related to the TOTEM message or they indicate another problem? Regards, Bernard Chew -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster