答复: 答复: 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5

许雪寒 <xuxuehan@xxxxxx> · Thu, 16 Mar 2017 10:36:49 +0000

Hi, Gregory, is it possible to unlock Connection::lock in Pipe::read_message before tcp_read_nonblocking is called? I checked the code again, it seems that the code in tcp_read_nonblocking doesn't need to be locked by Connection::lock.

-----邮件原件-----
发件人: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] 
发送时间: 2017年1月17日 7:14
收件人: 许雪寒
抄送: jiajia zhong; ceph-users@xxxxxxxxxxxxxx
主题: Re: 答复:  答复: 答复: Pipe "deadlock" in Hammer, 0.94.5

On Sat, Jan 14, 2017 at 7:54 PM, 许雪寒 <xuxuehan@xxxxxx> wrote:
> Thanks for your help:-)
>
> I checked the source code again, and in read_message, it does hold the Connection::lock:

You're correct of course; I wasn't looking and forgot about this bit.
This was added to deal with client-allocated buffers and/or op cancellation in librados, IIRC, and unfortunately definitely does need to be synchronized — I'm not sure about with pipe lookups, but probably even that. :/

Unfortunately it looks like you're running a version that didn't come from upstream (I see hash 81d4ad40d0c2a4b73529ff0db3c8f22acd15c398 in another email, which I can't find), so there's not much we can do to help with the specifics of this case — it's fiddly and my guess would be the same as Sage's, which you say is not the case.
-Greg

>
>                      while (left > 0) {
>                         // wait for data
>                         if (tcp_read_wait() < 0)
>                                 goto out_dethrottle;
>
>                         // get a buffer
>                         connection_state->lock.Lock();
>                         map<ceph_tid_t, pair<bufferlist, int> >::iterator p =
>                                         connection_state->rx_buffers.find(header.tid);
>                         if (p != connection_state->rx_buffers.end()) {
>                                 if (rxbuf.length() == 0 || p->second.second != rxbuf_version) {
>                                         ldout(msgr->cct,10)
>                                                                 << "reader seleting rx buffer v "
>                                                                                 << p->second.second << " at offset "
>                                                                                 << offset << " len "
>                                                                                 << p->second.first.length() << dendl;
>                                         rxbuf = p->second.first;
>                                         rxbuf_version = p->second.second;
>                                         // make sure it's big enough
>                                         if (rxbuf.length() < data_len)
>                                                 rxbuf.push_back(
>                                                                 buffer::create(data_len - rxbuf.length()));
>                                         blp = p->second.first.begin();
>                                         blp.advance(offset);
>                                 }
>                         } else {
>                                 if (!newbuf.length()) {
>                                         ldout(msgr->cct,20)
>                                                                 << "reader allocating new rx buffer at offset "
>                                                                                 << offset << dendl;
>                                         alloc_aligned_buffer(newbuf, data_len, data_off);
>                                         blp = newbuf.begin();
>                                         blp.advance(offset);
>                                 }
>                         }
>                         bufferptr bp = blp.get_current_ptr();
>                         int read = MIN(bp.length(), left);
>                         ldout(msgr->cct,20)
>                                                 << "reader reading nonblocking into "
>                                                                 << (void*) bp.c_str() << " len " << bp.length()
>                                                                 << dendl;
>                         int got = tcp_read_nonblocking(bp.c_str(), read);
>                         ldout(msgr->cct,30)
>                                                 << "reader read " << got << " of " << read << dendl;
>                         connection_state->lock.Unlock();
>                         if (got < 0)
>                                 goto out_dethrottle;
>                         if (got > 0) {
>                                 blp.advance(got);
>                                 data.append(bp, 0, got);
>                                 offset += got;
>                                 left -= got;
>                         } // else we got a signal or something; just loop.
>                 }
>
> As shown in the above code, in the reading loop, it first lock connection_state->lock and then do tcp_read_nonblocking. connection_state is of type PipeConnectionRef, connection_state->lock is Connection::lock.
>
> On the other hand, I'll check that whether there are a lot of message 
> to send as you suggested. Thanks:-)
>
>
>
> 发件人: Gregory Farnum [gfarnum@xxxxxxxxxx]
>
> 发送时间: 2017年1月14日 9:39
>
> 收件人: 许雪寒
>
> Cc: jiajia zhong; ceph-users@xxxxxxxxxxxxxx
>
> 主题: Re:  答复: 答复: Pipe "deadlock" in Hammer, 0.94.5
>
>
>
>
>
>
>
>
>
> On Thu, Jan 12, 2017 at 7:58 PM, 许雪寒 <xuxuehan@xxxxxx> wrote:
>
>
>
>
>
>
>
>
>
>
> Thank you for your continuous helpJ.
>
> We are using hammer 0.94.5 version, and what I read is the version of the source code.
>
> However, on the other hand, if Pipe::do_recv do act as blocked, is it 
> reasonable for the Pipe::reader_thread to block threads calling SimpleMessenger::submit_message  by holding Connection::lock?
>
> I think maybe a different mutex should be used in Pipe::read_message rather than Connection::lock.
>
>
>
>
>
>
>
> I don't think it does use that lock. Pipe::read_message() is generally 
> called while the pipe_lock is held, but not Connection::lock. (They 
> are separate.)
>
> I haven't dug into the relevant OSD code in a while, but I think it's 
> a lot more likely your OSD is just overloaded and is taking a while to send a lot of different messages, and that the loop it's in doesn't update the HeartbeatMap or something. Did you check  that?
>
> -Greg
>
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com