a SimpleMessenger fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



cc'ing list

On Mon, Oct 18, 2010 at 5:12 PM, Gregory Farnum <gregf@xxxxxxxxxxxxxxx> wrote:
> On Mon, Oct 18, 2010 at 4:57 PM, Colin P. McCabe <colinm@xxxxxxxxxxxxxxx> wrote:
>> Hi guys,
>>
>> See if you agree with this patch. My understanding is that if we're
>> doing a non-replace open operation, "existing" should always be zero.
>> However, it is possible that it is unintentionally set to something
>> else because of a previous run through the while(1) loop. This should
>> fix that.
>>
>> diff --git a/src/msg/SimpleMessenger.cc b/src/msg/SimpleMessenger.cc
>> index f826448..277acdb 100644
>> --- a/src/msg/SimpleMessenger.cc
>> +++ b/src/msg/SimpleMessenger.cc
>> @@ -782,6 +782,7 @@ int SimpleMessenger::Pipe::accept()
>>      } else {
>>        // new session
>>        dout(10) << "accept new session" << dendl;
>> +      existing = NULL;
>>        goto open;
>>      }
>>      assert(0);
>> --
>> 1.6.6.1
>>
> I'm pretty sure that actually can't happen -- the existing pipe isn't
> removed from queues[1] until after you've exited the loop. :) No
> reason we can't make it explicit, though!
> -Greg
>
> [1]:
>  replace:
>  replace = true;
>  if (connect.features & CEPH_FEATURE_RECONNECT_SEQ) {
>    reply_tag = CEPH_MSGR_TAG_SEQ;
>  }
>  dout(10) << "accept replacing " << existing << dendl;
>  existing->stop();
>  existing->unregister_pipe();
>
>  if (!existing->policy.lossy) { /* if we're lossy, we can lose messages and
>                                    should let the daemon handle it itself.
>    Otherwise, take over other Connection so we don't lose older messages */
>    existing->connection_state->clear_pipe();
>    existing->connection_state->pipe = get();
>    existing->connection_state->put();
>    existing->connection_state = NULL;
>  }
>

Well, what if the while(1) loop executes two times. The first time, we
take the branch

>    if (messenger->rank_pipe.count(peer_addr)) {
>      existing = messenger->rank_pipe[peer_addr];
>   ...etc etc...

but then for some reason we "goto reply", which brings us back up to
the top of the loop.
Then, the next time, we take the other branch of the if statement, and
end up doing "goto open". (You would know better than I whether it is
possible for us to take the other branch the second time around.
However, since we've dropped the messenger lock before doing "goto
reply", it seems reasonable to me that the messenger's state might
have changed).
Meanwhile, "existing" is still set to the value it had during the
previous loop iteration.

Then tcp_write fails, and we encounter:
>    if (existing)
>      existing->pipe_lock.Unlock();

Which tries to unlock a mutex that we don't have.

Anyway, that's the theory. I guess like you said, we should take the
patch since it clarifies things.

cheers,
Colin McCabe
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux