On 6/13/2013 9:11 AM, Jan Friesse wrote: > Fabio M. Di Nitto napsal(a): >> On 6/13/2013 7:23 AM, Jan Friesse wrote: >>> Fabio M. Di Nitto napsal(a): >>>> On 6/12/2013 5:03 PM, Jan Friesse wrote: >>>>> Signed-off-by: Jan Friesse <jfriesse@xxxxxxxxxx> >>>>> --- >>>>> exec/votequorum.c | 3 +++ >>>>> 1 files changed, 3 insertions(+), 0 deletions(-) >>>>> >>>>> diff --git a/exec/votequorum.c b/exec/votequorum.c >>>>> index b561409..131b734 100644 >>>>> --- a/exec/votequorum.c >>>>> +++ b/exec/votequorum.c >>>>> @@ -1634,6 +1634,9 @@ static void message_handler_req_exec_votequorum_nodeinfo ( >>>>> >>>>> if (nodeid == VOTEQUORUM_QDEVICE_NODEID) { >>>>> struct cluster_node *sender_node = find_node_by_nodeid(sender_nodeid); >>>>> + >>>>> + assert(sender_node != NULL); >>>>> + >>>>> if ((!cluster_is_quorate) && >>>>> (sender_node->flags & NODE_FLAGS_QUORATE)) { >>>>> node->votes = req_exec_quorum_nodeinfo->votes; >>>>> >>>> >>>> I would prefer a proper error checking rather than assert here. >>>> >>> >>> Agree. Actually, coverity found this problem, but I was ensure if it is >>> not problem (this is what I was assuming because of dereference of >>> sender_node) and then assert is proper solution OR it is problem and >>> then this patch starts discussion about proper solution. >>> >>>> The logic is that a node always send its own information (to populate >>>> node for node == sender_nodeid) and then send quorum device info from >>>> that node. >>>> >>>> See static int votequorum_sync_process (void) that´s the one triggering >>>> the population of the node entries. >>>> >>>> In theory, if sync queues the messages as expected, sender_node is never >>>> NULL. If it is, then i´ll prefer at least to see a message that says: >>>> "received qdevice info before sender node info\n" and then quit (unless >>>> we can recover from that status easily). >>>> >>> >>> Actually, because you've wrote that code you are maybe able to see how >>> to recover easily, but when I was reading that code, it really didn't >>> look so. >> >> Ok so the question turns into: >> >> static int votequorum_sync_process (void) >> { >> votequorum_exec_send_nodeinfo(us->node_id); <- message 1 >> votequorum_exec_send_nodeinfo(VOTEQUORUM_QDEVICE_NODEID); <- 2 >> if (strlen(qdevice_name)) { >> >> votequorum_exec_send_qdevice_reg(VOTEQUORUM_QDEVICE_OPERATION_REGISTER, >> qdevice_name); >> } >> return 0; >> } >> >> given this sequence of events, is there a possibility that message 1 can >> be delivered after message 2? >> > > Short answer is NO. Long answer is, if there is such possibility, it's > ultra high priority bug and must be fixed. > >> If there is that possibility, then we need to review the whole >> send_nodeinfo logic as there is the assumption (in more than one place >> IIRC) that they arrive in sequence. >> >> If there is the guarantee that they are delivered in the same order, >> then we can use that assert check to verify it and error out by >> reporting deeper issues. > > There is a lot of informations logged at the beginning of > message_handler_req_exec_votequorum_nodeinfo. Assert will cause sigabrt > so blackbox is created with all that logged informations. -> From my > point of view, assert is enough, because axiom 1 tells us that msg 1 is > always before msg 2 and axiom 2 is msg 1 contains informations about > send_node -> assert never appears and if it appears, it's bug in code. Ok works for me if you think the info are enough, otherwise just switch to if (sender_node == NULL) { log some extra stuff die hard } Fabio _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss