Thanks for your help:-) I checked the source code again, and in read_message, it does hold the Connection::lock: while (left > 0) { // wait for data if (tcp_read_wait() < 0) goto out_dethrottle; // get a buffer connection_state->lock.Lock(); map<ceph_tid_t, pair<bufferlist, int> >::iterator p = connection_state->rx_buffers.find(header.tid); if (p != connection_state->rx_buffers.end()) { if (rxbuf.length() == 0 || p->second.second != rxbuf_version) { ldout(msgr->cct,10) << "reader seleting rx buffer v " << p->second.second << " at offset " << offset << " len " << p->second.first.length() << dendl; rxbuf = p->second.first; rxbuf_version = p->second.second; // make sure it's big enough if (rxbuf.length() < data_len) rxbuf.push_back( buffer::create(data_len - rxbuf.length())); blp = p->second.first.begin(); blp.advance(offset); } } else { if (!newbuf.length()) { ldout(msgr->cct,20) << "reader allocating new rx buffer at offset " << offset << dendl; alloc_aligned_buffer(newbuf, data_len, data_off); blp = newbuf.begin(); blp.advance(offset); } } bufferptr bp = blp.get_current_ptr(); int read = MIN(bp.length(), left); ldout(msgr->cct,20) << "reader reading nonblocking into " << (void*) bp.c_str() << " len " << bp.length() << dendl; int got = tcp_read_nonblocking(bp.c_str(), read); ldout(msgr->cct,30) << "reader read " << got << " of " << read << dendl; connection_state->lock.Unlock(); if (got < 0) goto out_dethrottle; if (got > 0) { blp.advance(got); data.append(bp, 0, got); offset += got; left -= got; } // else we got a signal or something; just loop. } As shown in the above code, in the reading loop, it first lock connection_state->lock and then do tcp_read_nonblocking. connection_state is of type PipeConnectionRef, connection_state->lock is Connection::lock. On the other hand, I'll check that whether there are a lot of message to send as you suggested. Thanks:-) 发件人: Gregory Farnum [gfarnum@xxxxxxxxxx] 发送时间: 2017年1月14日 9:39 收件人: 许雪寒 Cc: jiajia zhong; ceph-users@xxxxxxxxxxxxxx 主题: Re: 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5 On Thu, Jan 12, 2017 at 7:58 PM, 许雪寒 <xuxuehan@xxxxxx> wrote: Thank you for your continuous helpJ. We are using hammer 0.94.5 version, and what I read is the version of the source code. However, on the other hand, if Pipe::do_recv do act as blocked, is it reasonable for the Pipe::reader_thread to block threads calling SimpleMessenger::submit_message by holding Connection::lock? I think maybe a different mutex should be used in Pipe::read_message rather than Connection::lock. I don't think it does use that lock. Pipe::read_message() is generally called while the pipe_lock is held, but not Connection::lock. (They are separate.) I haven't dug into the relevant OSD code in a while, but I think it's a lot more likely your OSD is just overloaded and is taking a while to send a lot of different messages, and that the loop it's in doesn't update the HeartbeatMap or something. Did you check that? -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com