cc'ing list On Mon, Oct 18, 2010 at 5:12 PM, Gregory Farnum <gregf@xxxxxxxxxxxxxxx> wrote: > On Mon, Oct 18, 2010 at 4:57 PM, Colin P. McCabe <colinm@xxxxxxxxxxxxxxx> wrote: >> Hi guys, >> >> See if you agree with this patch. My understanding is that if we're >> doing a non-replace open operation, "existing" should always be zero. >> However, it is possible that it is unintentionally set to something >> else because of a previous run through the while(1) loop. This should >> fix that. >> >> diff --git a/src/msg/SimpleMessenger.cc b/src/msg/SimpleMessenger.cc >> index f826448..277acdb 100644 >> --- a/src/msg/SimpleMessenger.cc >> +++ b/src/msg/SimpleMessenger.cc >> @@ -782,6 +782,7 @@ int SimpleMessenger::Pipe::accept() >> } else { >> // new session >> dout(10) << "accept new session" << dendl; >> + existing = NULL; >> goto open; >> } >> assert(0); >> -- >> 1.6.6.1 >> > I'm pretty sure that actually can't happen -- the existing pipe isn't > removed from queues[1] until after you've exited the loop. :) No > reason we can't make it explicit, though! > -Greg > > [1]: > replace: > replace = true; > if (connect.features & CEPH_FEATURE_RECONNECT_SEQ) { > reply_tag = CEPH_MSGR_TAG_SEQ; > } > dout(10) << "accept replacing " << existing << dendl; > existing->stop(); > existing->unregister_pipe(); > > if (!existing->policy.lossy) { /* if we're lossy, we can lose messages and > should let the daemon handle it itself. > Otherwise, take over other Connection so we don't lose older messages */ > existing->connection_state->clear_pipe(); > existing->connection_state->pipe = get(); > existing->connection_state->put(); > existing->connection_state = NULL; > } > Well, what if the while(1) loop executes two times. The first time, we take the branch > if (messenger->rank_pipe.count(peer_addr)) { > existing = messenger->rank_pipe[peer_addr]; > ...etc etc... but then for some reason we "goto reply", which brings us back up to the top of the loop. Then, the next time, we take the other branch of the if statement, and end up doing "goto open". (You would know better than I whether it is possible for us to take the other branch the second time around. However, since we've dropped the messenger lock before doing "goto reply", it seems reasonable to me that the messenger's state might have changed). Meanwhile, "existing" is still set to the value it had during the previous loop iteration. Then tcp_write fails, and we encounter: > if (existing) > existing->pipe_lock.Unlock(); Which tries to unlock a mutex that we don't have. Anyway, that's the theory. I guess like you said, we should take the patch since it clarifies things. cheers, Colin McCabe -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html