From: Yunkai Zhang: Today, I have observed one of the reason that corosync running into FAILED TO RECEIVE state. There was five nodes(A,B,C,D,E) in my testing, and I limited the UDP transmission rate of C nodes by iptables command: iptables -A INPUT -i eth0 -p udp -m limit --limit 10000/s --limit-burst 1 -j ACCEPT iptables -A INPUT -i eth0 -p udp -j DROP After one hour later, C node had been missing some MCAST messages, it's state described as following: ==state of C node== my_aru:0x805 my_high_seq_received:0xC2C my_aru_count:7 =>receved MCAST message with seq:806 from B nodes =>enter *message_handler_mcast* =>add this message to regular_sort_queue ... =>enter *update_aru* function => range = (my_high_seq_received - my_aru) = (0xC2C - 0x805) = 1063 => if range>1024, do nothing and and return directly. ==END== According this logic, after (my_high_req_received-my_aru)>1024, my_aru will not be updated though corosync can receive MCAST messages retransmitted by other nodes. But at that timte, my_aru_count was only 7. So the corosync at C node would keep in this status until my_aru_count increased to fail_to_recv_const(the default value is 2500). This was a long time for corosync, but we wasted it. To solve this issue, maybe we can enlarge the range condition in update_aru function? Or we just ingnore the checking of range value, it seems no harmfull, because we have been using fail_to_recv_const to control the things. Signed-off-by: Steven Dake <sdake@xxxxxxxxxx> Reviewed-by: Jan Friesse <jfriesse@xxxxxxxxxx> (backported from flatiron commit e48ddf99a67754dea056a54f404f3638cf829b9c) --- branches/whitetank/exec/totemsrp.c | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/branches/whitetank/exec/totemsrp.c b/branches/whitetank/exec/totemsrp.c index c167752..1a07ede 100644 --- a/branches/whitetank/exec/totemsrp.c +++ b/branches/whitetank/exec/totemsrp.c @@ -2268,9 +2268,6 @@ static void update_aru ( } range = instance->my_high_seq_received - instance->my_aru; - if (range > 1024) { - return; - } my_aru_saved = instance->my_aru; for (i = 1; i <= range; i++) { -- 1.7.4.1 _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss