Re: totemsrp: Cancel token holding while in retransmition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, the previous patch is wrong. Here is the correction.

On Aug 5, 2014 10:18 PM, "Christine Caulfield" <ccaulfie@xxxxxxxxxx> wrote:
Hi Jason,

Thanks for testing that - and the extra info. I'll have another think then. If I can't come up with anything more we might go with your patch.

Chrissie

On 05/08/14 13:01, jason wrote:
Hi Christine,
I have tested your patch but it can not solve my problem. By adding
printf, I found that whenever during retransmition occured in my test
case or not, the retrans_message_queue is always empty. It seems that
the retrans_message_queue is for recovery state used only?

On Aug 5, 2014 3:50 PM, "Christine Caulfield" <ccaulfie@xxxxxxxxxx
<mailto:ccaulfie@xxxxxxxxxx>> wrote:

    On 01/08/14 10:50, Christine Caulfield wrote:

        On 01/08/14 10:42, Jan Friesse wrote:

            Jason,


                Hi All,

                I have encountered a problem that when there is no other
                activty on
                ring but
                only retransmition, and token is in hold mode, the
                retransmition will
                become
                slow. More over, if the retransmition is always fail but
                token


            Yes

                rotation works well,
                then it takes quite a lone time(fail_to_recv_const *
                token_hold = 2500
                * 180ms = 450sec) for the retransmiting node to meet the
                "FAILED TO
                RECEIVE" condition to
                re-construct a new ring. This can be reporduced by the
                following steps:

                      1) Create a two-node cluster in udpu transport mode.
                      2) Wait until there is no other activty on ring.
                      3) One, or both nodes delete each other in nodelist in
                corosync.conf
                      4) corosync-cfgtool -R, this can cause a message
                retransmition,
                but I am
                      not sure why.
                      5) Since tokenrotation still works well, but the
                retransmition
                can not be
                      satisfied due to node deletion, so, only "FAILED
                TO RECEIVE"
                condition can form new
                      ring. But we need to wait 450 seconds for it to
                happen. During
                this wait,
                      we saw the following logs:


            This is really weird case.

                      Jul 30 11:21:06 notice  [TOTEM ] Retransmit List: e
                      Jul 30 11:21:06 notice  [TOTEM ] Retransmit List: e
                      Jul 30 11:21:06 notice  [TOTEM ] Retransmit List: e
                      Jul 30 11:21:06 notice  [TOTEM ] Retransmit List: e
                      Jul 30 11:21:06 notice  [TOTEM ] Retransmit List: e
                      ...


                This problem can be solved by adding
                token_hold_cancel_send() in both
                retransmition request and response conditions in
                orf_token_rtr() to
                speed up
                retransmition. I created a patch below, any comments?


            Ok. Patch looks fine, but during review I had other idea.
            What about
            prohibit starting of hold mode where there are messages to
            retransmit?
            Such solution may be cleaner, isn't it?

            Anyway. This is change in very critical part of the code, so
            Chrissie,
            can you please take a look to patch and express your opinion?



        I've been looking it over yesterday. It's a problem I have
        definitely
        seen myself on some VM systems so it's certainly not an isolated
        case. I
        think Honza is right that there might be a better way of fixing
        it so
        I'll have a look.

        Chrissie



    Annoyingly my common reproducer seems not to be working and I can't
    get yours to make it happen either. If you can still reproduce it
    could you try this patch for me please?

    Chrissie


    _______________________________________________
    discuss mailing list
    discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
    http://lists.corosync.org/mailman/listinfo/discuss


From dc2d0c2bc75492cada193909c2cd66fba4367d67 Mon Sep 17 00:00:00 2001
From: Jason HU <huzhijiang@xxxxxxxxx>
Date: Wed, 6 Aug 2014 00:10:56 +0800
Subject: [PATCH] [totemsrp] Cancel token holding while in retransmition

---
 exec/totemsrp.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/exec/totemsrp.c b/exec/totemsrp.c
index dcda8d1..b603ef5 100644
--- a/exec/totemsrp.c
+++ b/exec/totemsrp.c
@@ -3650,6 +3650,12 @@ static int message_handler_orf_token (
 		transmits_allowed = fcc_calculate (instance, token);
 		mcasted_retransmit = orf_token_rtr (instance, token, &transmits_allowed);
 
+		if (instance->my_token_held == 1 &&
+			(token->rtr_list_entries > 0 || mcasted_retransmit > 0)) {
+			instance->my_token_held = 0;
+			forward_token = 1;
+		}
+
 		fcc_rtr_limit (instance, token, &transmits_allowed);
 		mcasted_regular = orf_token_mcast (instance, token, transmits_allowed);
 /*
-- 
1.9.4.msysgit.0

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux