Hi All,
Please help to take a look at is patch. Thanks!
---------- 转发的邮件 ----------
发件人:"Jason" <huzhijiang@xxxxxxxxx>
日期:2014年11月10日 下午9:26
主题:[PATCH] [totemrrp] Reset timer_problem_decrementer to zero in active_timer_problem_decrementer_cancel()
收件人:"Jason" <discuss@xxxxxxxxxxxx>
抄送:
After a heartbeat link's FAULTY and its auto re-enable,
active_instance->timer_problem_decrementer did not reset to zero. So in the
next timer_function_active_token_expired() round,
active_timer_problem_decrementer_start() will not be called. This will
result in that the active_instance->counter_problems of this link can not
be decreased any more. Cause rrp lose the ability to tolerate network
fluctuation.
This problem can be reproduced by the following sequence:
1) Set RRP in active mode, configure at least 2 heartbeat links.
2) Unplug one link till corosync-cfgtool -s shows it is FAULTY.
3) Re-plug this link then corosync-cfgtool -s shows it is active with no
faults.
4) Unplug this link again but quicky re-plug it before it becomes FAULTY.
5) Finally, you can see corosync-cfgtool -s shows it is in "Incrementing
problem counter" state despite it currently is physically healthy.
It can be solved by not forget to reset timer_problem_decrementer to zero
in active_timer_problem_decrementer_cancel().
Signed-off-by: Jason <huzhijiang@xxxxxxxxx>
---
exec/totemrrp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/exec/totemrrp.c b/exec/totemrrp.c
index 95a789e..a798bba 100644
--- a/exec/totemrrp.c
+++ b/exec/totemrrp.c
@@ -1542,6 +1542,7 @@ static void active_timer_problem_decrementer_cancel (
qb_loop_timer_del (
active_instance->rrp_instance->poll_handle,
active_instance->timer_problem_decrementer);
+ active_instance->timer_problem_decrementer = 0;
}
--
1.9.4.msysgit.2
发件人:"Jason" <huzhijiang@xxxxxxxxx>
日期:2014年11月10日 下午9:26
主题:[PATCH] [totemrrp] Reset timer_problem_decrementer to zero in active_timer_problem_decrementer_cancel()
收件人:"Jason" <discuss@xxxxxxxxxxxx>
抄送:
After a heartbeat link's FAULTY and its auto re-enable,
active_instance->timer_problem_decrementer did not reset to zero. So in the
next timer_function_active_token_expired() round,
active_timer_problem_decrementer_start() will not be called. This will
result in that the active_instance->counter_problems of this link can not
be decreased any more. Cause rrp lose the ability to tolerate network
fluctuation.
This problem can be reproduced by the following sequence:
1) Set RRP in active mode, configure at least 2 heartbeat links.
2) Unplug one link till corosync-cfgtool -s shows it is FAULTY.
3) Re-plug this link then corosync-cfgtool -s shows it is active with no
faults.
4) Unplug this link again but quicky re-plug it before it becomes FAULTY.
5) Finally, you can see corosync-cfgtool -s shows it is in "Incrementing
problem counter" state despite it currently is physically healthy.
It can be solved by not forget to reset timer_problem_decrementer to zero
in active_timer_problem_decrementer_cancel().
Signed-off-by: Jason <huzhijiang@xxxxxxxxx>
---
exec/totemrrp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/exec/totemrrp.c b/exec/totemrrp.c
index 95a789e..a798bba 100644
--- a/exec/totemrrp.c
+++ b/exec/totemrrp.c
@@ -1542,6 +1542,7 @@ static void active_timer_problem_decrementer_cancel (
qb_loop_timer_del (
active_instance->rrp_instance->poll_handle,
active_instance->timer_problem_decrementer);
+ active_instance->timer_problem_decrementer = 0;
}
--
1.9.4.msysgit.2
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss