Jason, thanks for patch. I've made some small improvements in commit message + some indenting (code is indented by tabs) and now running test suite. If that pass, I will commit into main branch. Regards, Honza jason napsal(a): > According to the paper found at http://citeseerx.ist.psu.edu/ > viewdoc/download?doi=10.1.1.52.4028&rep=rep1&type=pdf, if a processor > receives a join message in the operational state and if the receiver’s > identifier is in the join message’s fail list, then it ignores the join > message. > > By applying this validation of join messages, we can avoid unnecessary > switching from operational state to gather state(or even lead to rings > can not be merged) like the following to happen. > > 1. Initially, there is only one ring contains three nodes, say ring(A,B,C). > 2. A and B network partition, "in the same time", C is down. > 3. Node A sends join message with proclist:A,B,C. faillist:NULL. > Node B sends join message with proclist:A,B,C. faillist:NULL. > 4. Both A and B consensus timeout due to network partition. > 5. A and B network remerged. > 6. Node A sends join message with proclist:A,B,C. faillist:B,C. and create > ring(A). > Node B sends join message with proclist:A,B,C. faillist:A,C. and create > ring(B). > 7. Say join message with proclist:A,B,C. faillist:A,C which sent by node B > is received by node A because network remerged. > 8. Node A shifts to gather state and send out a modified join message with > proclist:A,B,C. faillist:B. such join message will prevent both A and B > from merging. > 9. Node A consensus timeout (caused by waiting node C) and sends join > message > with proclist:A,B,C. faillist:B,C again. > > Same thing happens on node B, so A and B will dead loop forever in step 7,8 > and 9. > > As the paper also said:"If a processor receives a join message in the > operational state and if the sender's identifier is in the receiver's > my_proclist and the join message's ring_seq is less than the receiver's > ring sequence number, then it ignores the join message too. " So these patch > applying these validations of join messages altogether. > > Signed-off-by: Jason <huzhijiang@xxxxxxxxx> > --- > exec/totemsrp.c | 34 +++++++++++++++++++++++++++++++++- > 1 file changed, 33 insertions(+), 1 deletion(-) > > diff --git a/exec/totemsrp.c b/exec/totemsrp.c > index e383f7e..cc10370 100644 > --- a/exec/totemsrp.c > +++ b/exec/totemsrp.c > @@ -4274,6 +4274,36 @@ static void memb_merge_detect_endian_convert ( > srp_addr_copy_endian_convert (&out->system_from, &in->system_from); > } > > +static int ignore_join_under_operational ( > + struct totemsrp_instance *instance, > + const struct memb_join *memb_join) > +{ > + struct srp_addr *proc_list; > + struct srp_addr *failed_list; > + unsigned long long ring_seq; > + > + proc_list = (struct srp_addr *)memb_join->end_of_memb_join; > + failed_list = proc_list + memb_join->proc_list_entries; > + ring_seq = memb_join->ring_seq; > + > + if (memb_set_subset (&instance->my_id, 1, > + failed_list, memb_join->failed_list_entries)) { > + return 1; > + } > + > + /* In operational state, my_proc_list is exactly the same as > + my_memb_list. */ > + > + if ((memb_set_subset (&memb_join->system_from, 1, > + instance->my_memb_list, > + instance->my_memb_entries)) && > + (ring_seq < instance->my_ring_id.seq)) { > + return 1; > + } > + > + return 0; > +} > + > static int message_handler_memb_join ( > struct totemsrp_instance *instance, > const void *msg, > @@ -4304,7 +4334,9 @@ static int message_handler_memb_join ( > } > switch (instance->memb_state) { > case MEMB_STATE_OPERATIONAL: > - memb_join_process (instance, memb_join); > + if (ignore_join_under_operational(instance, memb_join) == 0) { > + memb_join_process (instance, memb_join); > + } > break; > > case MEMB_STATE_GATHER: > > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss