Re: [RFC PATCH] Resolve an abnormal exit when consensus timeout expired.

Steven Dake <sdake@xxxxxxxxxx> · Tue, 25 Oct 2011 06:57:50 -0700

On 10/25/2011 05:32 AM, Yunkai Zhang wrote:
> Hi Steven Dake,
> 
> We have tested your patch last week for a long time, but I am so sorry
> to tell you that we could not duplicate the issue again---it never run
> into "FAIL" state. The corosync runs so strongly  beyond all our
> expectation.
> 

Then issue could not be duplicated with patch?  But without patch, your
not sure if problem still occurs?

Regards
-steve

> It's both a good news and a bad news :(
> 
> Maybe there is unknown condition to duplicate this issue, and I will
> continue to monitor it.
> 
> On Wed, Oct 12, 2011 at 8:24 PM, Yunkai Zhang <qiushu.zyk@xxxxxxxxxx> wrote:
>> Hi Steven Dake,
>>
>> Thank you for your reply.
>> I would change this discussion CC to discuss@xxxxxxxxxxxx instead of
>> openais@xxxxxxxxxxxxxx
>> as we known openais@xxxxxxxxxxxxxx could not work.
>>
>> On Tue, Oct 11, 2011 at 10:54 PM, Steven Dake <sdake@xxxxxxxxxx> wrote:
>>> On 10/10/2011 08:58 PM, qiushu.zyk@xxxxxxxxxx wrote:
>>>> From: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx>
>>>>
>>>> In our 20 nodes cluster testing(corosync v1.4.1 vs sheepdog), an abormal exit
>>>> would occur when consensus timeout expired and if there was no other processors
>>>> in consensus_list.
>>>>
>>>> = analysis =
>>>> 1. when consenus timeout, corosync would enter memb_state_consensus_timeout_expired
>>>>    function.
>>>>
>>>> 2. if its consensus_list only contain my_id, the code would
>>>>    execute memb_set_merge which make my_failed_list equal to
>>>>    my_proc_list.
>>>>
>>>> 3. call memb_state_gather_enter function and mcast a join message which
>>>>    contain proc_list and failed_list with the same processor IDs.
>>>>
>>>> 4. the join message would be received by itself and
>>>>    memb_join_process/memb_consensus_agreed would by called.
>>>>
>>>> 5. because the proc_list equal to failed_list in the join message,
>>>>    the assert instruction will be reached, then an abnormal exit occur.
>>>>
>>>> = solution =
>>>> This patch try to resolve this issue by remove my_id from
>>>> my_failed_list before corosync call memb_state_gather_enter from
>>>> memb_state_consensus_timeout_expired.
>>>>
>>>> when network partition occur and the processor can't communicate with
>>>> all other processors, it will form an single ring with only itself other
>>>> than trigger abnormal exit.
>>>>
>>>
>>> Yunkai,
>>>
>>> Thank you for the patch.  I am really hesitant to make any changes to
>>> totemsrp that I haven't thought long and hard about.  Your solution is
>>> clever and well thought out, but totem has thousands of details - to the
>>> point that I always want to get to the root cause of the issue when
>>> fixing problems.
>>>
>>> I think your running into a "FAILED TO RECV" state.  There is a patch
>>> outstanding for this issue but we have been unable to find anyone to
>>> test it.  Our environments don't demonstrate a failed to receive scenario.
>>>
>>> Can you verify if you have FAILED TO RECV (should be in the fplay data).
>>>
>>
>> Yes, corosync will run into a "FAILED TO RECV" state in
>> message_handler_orf_token function.
>> We can see the last three lines of logging messages from
>> /var/log/cluster/corosync.log as following:
>>
>>   Aug 25 13:42:25 corosync [TOTEM ] FAILED TO RECEIVE
>>   Aug 25 13:42:25 corosync [TOTEM ] entering GATHER state from 6.
>>   Aug 25 13:42:27 corosync [TOTEM ] entering GATHER state from 0.
>>
>> According our testing, we can duplicate this issue so easily with
>> two conditions:
>> 1). more than 20 nodes (not exactly).
>> 2). increasing the network load  which will cause broadcasting
>> message(regular msg/join msg) failed  frequently.
>>
>>> If so, can you run with the patch:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=636583 comment #8
>>
>> Thank for your patch,
>> I plan to test it next week as I have no enough nodes to test now.
>>
>>>
>>> Thanks!
>>> -steve
>>>
>>>
>>>> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx>
>>>> ---
>>>>  exec/totemsrp.c |   33 +++++++++++++++++++++++++++++++++
>>>>  1 files changed, 33 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/exec/totemsrp.c b/exec/totemsrp.c
>>>> index 0778d55..653f801 100644
>>>> --- a/exec/totemsrp.c
>>>> +++ b/exec/totemsrp.c
>>>> @@ -1324,6 +1324,33 @@ static void memb_set_merge (
>>>>       return;
>>>>  }
>>>>
>>>> +/*
>>>> + * remove subset from fullset
>>>> + */
>>>> +static void memb_set_remove(
>>>> +     const struct srp_addr *subset, int subset_entries,
>>>> +     struct srp_addr *fullset, int *fullset_entries)
>>>> +{
>>>> +     int found = 0;
>>>> +     int i;
>>>> +     int j;
>>>> +
>>>> +     for (i = 0; i < subset_entries; i++) {
>>>> +             for (j = 0; j < *fullset_entries; j++) {
>>>> +                     if (srp_addr_equal (&fullset[j], &subset[i])) {
>>>> +                             found = 1;
>>>> +                             break;
>>>> +                     }
>>>> +             }
>>>> +             if (found == 1) {
>>>> +                     for (; j < (*fullset_entries-1); j++) {
>>>> +                             srp_addr_copy (&fullset[j], &fullset[j+1]);
>>>> +                     }
>>>> +                     *fullset_entries = *fullset_entries - 1;
>>>> +             }
>>>> +     }
>>>> +}
>>>> +
>>>>  static void memb_set_and_with_ring_id (
>>>>       struct srp_addr *set1,
>>>>       struct memb_ring_id *set1_ring_ids,
>>>> @@ -1541,6 +1568,12 @@ static void memb_state_consensus_timeout_expired (
>>>>
>>>>               memb_set_merge (no_consensus_list, no_consensus_list_entries,
>>>>                       instance->my_failed_list, &instance->my_failed_list_entries);
>>>> +
>>>> +             if (instance->my_proc_list_entries == instance->my_failed_list_entries){
>>>> +                     memb_set_remove (&instance->my_id, 1,
>>>> +                             instance->my_failed_list, &instance->my_failed_list_entries);
>>>> +             }
>>>> +
>>>>               memb_state_gather_enter (instance, 0);
>>>>       }
>>>>  }
>>>
>>>
>>
>>
>>
>> --
>> Yunkai Zhang
>> work at taobao.com
>>
> 
> 
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss