Hi All, I have encountered a problem that when there is no other activty on ring but only retransmition, and token is in hold mode, the retransmition will become slow. More over, if the retransmition is always fail but token rotation works well, then it takes quite a lone time(fail_to_recv_const * token_hold = 2500 * 180ms = 450sec) for the retransmiting node to meet the "FAILED TO RECEIVE" condition to re-construct a new ring. This can be reporduced by the following steps: 1) Create a two-node cluster in udpu transport mode. 2) Wait until there is no other activty on ring. 3) One, or both nodes delete each other in nodelist in corosync.conf 4) corosync-cfgtool -R, this can cause a message retransmition, but I am not sure why. 5) Since tokenrotation still works well, but the retransmition can not be satisfied due to node deletion, so, only "FAILED TO RECEIVE" condition can form new ring. But we need to wait 450 seconds for it to happen. During this wait, we saw the following logs: Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e ... This problem can be solved by adding token_hold_cancel_send() in both retransmition request and response conditions in orf_token_rtr() to speed up retransmition. I created a patch below, any comments? Signed-off-by: Jason HU <huzhijiang@xxxxxxxxx> ------------------------------- exec/totemsrp.c ------------------------------- index dcda8d1..c227c44 100644 @@ -2672,6 +2672,7 @@ static int orf_token_rtr ( strcpy (retransmit_msg, "Retransmit List: "); if (orf_token->rtr_list_entries) { + token_hold_cancel_send(instance); log_printf (instance->totemsrp_log_level_debug, "Retransmit List %d", orf_token->rtr_list_entries); for (i = 0; i < orf_token->rtr_list_entries; i++) { @@ -2726,6 +2727,10 @@ static int orf_token_rtr ( range = orf_token->seq - instance->my_aru; assert (range < QUEUE_RTR_ITEMS_SIZE_MAX); + if (range >= 1) { + token_hold_cancel_send(instance); + } + for (i = 1; (orf_token->rtr_list_entries < RETRANSMIT_ENTRIES_MAX) && (i <= range); i++) { -- Yours, Jason _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss