Fabio M. Di Nitto napsal(a): > On 6/13/2013 7:23 AM, Jan Friesse wrote: >> Fabio M. Di Nitto napsal(a): >>> On 6/12/2013 5:03 PM, Jan Friesse wrote: >>>> Signed-off-by: Jan Friesse <jfriesse@xxxxxxxxxx> >>>> --- >>>> exec/votequorum.c | 3 +++ >>>> 1 files changed, 3 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/exec/votequorum.c b/exec/votequorum.c >>>> index b561409..131b734 100644 >>>> --- a/exec/votequorum.c >>>> +++ b/exec/votequorum.c >>>> @@ -1634,6 +1634,9 @@ static void message_handler_req_exec_votequorum_nodeinfo ( >>>> >>>> if (nodeid == VOTEQUORUM_QDEVICE_NODEID) { >>>> struct cluster_node *sender_node = find_node_by_nodeid(sender_nodeid); >>>> + >>>> + assert(sender_node != NULL); >>>> + >>>> if ((!cluster_is_quorate) && >>>> (sender_node->flags & NODE_FLAGS_QUORATE)) { >>>> node->votes = req_exec_quorum_nodeinfo->votes; >>>> >>> >>> I would prefer a proper error checking rather than assert here. >>> >> >> Agree. Actually, coverity found this problem, but I was ensure if it is >> not problem (this is what I was assuming because of dereference of >> sender_node) and then assert is proper solution OR it is problem and >> then this patch starts discussion about proper solution. >> >>> The logic is that a node always send its own information (to populate >>> node for node == sender_nodeid) and then send quorum device info from >>> that node. >>> >>> See static int votequorum_sync_process (void) that´s the one triggering >>> the population of the node entries. >>> >>> In theory, if sync queues the messages as expected, sender_node is never >>> NULL. If it is, then i´ll prefer at least to see a message that says: >>> "received qdevice info before sender node info\n" and then quit (unless >>> we can recover from that status easily). >>> >> >> Actually, because you've wrote that code you are maybe able to see how >> to recover easily, but when I was reading that code, it really didn't >> look so. > > Ok so the question turns into: > > static int votequorum_sync_process (void) > { > votequorum_exec_send_nodeinfo(us->node_id); <- message 1 > votequorum_exec_send_nodeinfo(VOTEQUORUM_QDEVICE_NODEID); <- 2 > if (strlen(qdevice_name)) { > > votequorum_exec_send_qdevice_reg(VOTEQUORUM_QDEVICE_OPERATION_REGISTER, > qdevice_name); > } > return 0; > } > > given this sequence of events, is there a possibility that message 1 can > be delivered after message 2? > Short answer is NO. Long answer is, if there is such possibility, it's ultra high priority bug and must be fixed. > If there is that possibility, then we need to review the whole > send_nodeinfo logic as there is the assumption (in more than one place > IIRC) that they arrive in sequence. > > If there is the guarantee that they are delivered in the same order, > then we can use that assert check to verify it and error out by > reporting deeper issues. There is a lot of informations logged at the beginning of message_handler_req_exec_votequorum_nodeinfo. Assert will cause sigabrt so blackbox is created with all that logged informations. -> From my point of view, assert is enough, because axiom 1 tells us that msg 1 is always before msg 2 and axiom 2 is msg 1 contains informations about send_node -> assert never appears and if it appears, it's bug in code. > > Fabio Honza _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss