Re: Assert in Pipe.cc in Hammer 0.94.7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>
>> Is this a known issue? I searched for it and could not find anyone hitting this.
>
> I don't think so.
>
>>
>> Looking at the 0.94.7 code, looks like pipe_lock is released in line
>> 886 in the beginning of connect() routine.
>> It is again taken later. But there is update to state member variable
>> without checking current state in code below.
>
> Where exactly?

Sorry for late response.

In function int Pipe::connect(), pipe_lock is released first.

int Pipe::connect()
{
  bool got_bad_auth = false;

  ldout(msgr->cct,10) << "connect " << connect_seq << dendl;
  assert(pipe_lock.is_locked());

  __u32 cseq = connect_seq;
  __u32 gseq = msgr->get_global_seq();

  // stop reader thread
  join_reader();

  pipe_lock.Unlock();

.....

some  tcp_read and do_sendmsg after this for protocol processing
In between unlock and lock, pipe may be getting replaced due to network errors.
and moved to STATE_CLOSED in accept routine while this pipe is waiting for lock
below.
......

pipe_lock.Lock();

....
After this there is code like below which override Pipe state.
There is a danger of overwriting STATE_CLOSED state?

    if (reply.tag == CEPH_MSGR_TAG_WAIT) {
      ldout(msgr->cct,3) << "connect got WAIT (connection race)" << dendl;
      state = STATE_WAIT;
      goto stop_locked;
    }

Please see if there is any issue in above code.

>
>> If pipe is moved to STATE_CLOSED in the interval when lock was released,
>> there is a chance that it can get overwritten when
>> CEPH_MSGR_TAG_WAIT(STATE_WAIT) comes as reply or
>> directly to STATE_OPEN in line 1172.
>
> I'm not following your referents here. The out_seq can get
> overwritten? Something else?

We got a different acked out_seq from other side than the messages
present in this pipe's queue.
This leads me to think there is some race not handled correctly.

>
>> I feel this may cause assert seen above but only if many other things
>> also happen.
>
> We had some vaguely similar issues in the time after CEPH_MSGR_TAG_SEQ
> was introduced, but I think it's been a while. You might have spotted
> another rare one.
> -Greg

-Padmanabh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux