Hi, Gregory, is it possible to unlock Connection::lock in Pipe::read_message before tcp_read_nonblocking is called? I checked the code again, it seems that the code in tcp_read_nonblocking doesn't need to be locked by Connection::lock. -----邮件原件----- 发件人: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] 发送时间: 2017年1月17日 7:14 收件人: 许雪寒 抄送: jiajia zhong; ceph-users@xxxxxxxxxxxxxx 主题: Re: 答复: 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5 On Sat, Jan 14, 2017 at 7:54 PM, 许雪寒 <xuxuehan@xxxxxx> wrote: > Thanks for your help:-) > > I checked the source code again, and in read_message, it does hold the Connection::lock: You're correct of course; I wasn't looking and forgot about this bit. This was added to deal with client-allocated buffers and/or op cancellation in librados, IIRC, and unfortunately definitely does need to be synchronized — I'm not sure about with pipe lookups, but probably even that. :/ Unfortunately it looks like you're running a version that didn't come from upstream (I see hash 81d4ad40d0c2a4b73529ff0db3c8f22acd15c398 in another email, which I can't find), so there's not much we can do to help with the specifics of this case — it's fiddly and my guess would be the same as Sage's, which you say is not the case. -Greg > > while (left > 0) { > // wait for data > if (tcp_read_wait() < 0) > goto out_dethrottle; > > // get a buffer > connection_state->lock.Lock(); > map<ceph_tid_t, pair<bufferlist, int> >::iterator p = > connection_state->rx_buffers.find(header.tid); > if (p != connection_state->rx_buffers.end()) { > if (rxbuf.length() == 0 || p->second.second != rxbuf_version) { > ldout(msgr->cct,10) > << "reader seleting rx buffer v " > << p->second.second << " at offset " > << offset << " len " > << p->second.first.length() << dendl; > rxbuf = p->second.first; > rxbuf_version = p->second.second; > // make sure it's big enough > if (rxbuf.length() < data_len) > rxbuf.push_back( > buffer::create(data_len - rxbuf.length())); > blp = p->second.first.begin(); > blp.advance(offset); > } > } else { > if (!newbuf.length()) { > ldout(msgr->cct,20) > << "reader allocating new rx buffer at offset " > << offset << dendl; > alloc_aligned_buffer(newbuf, data_len, data_off); > blp = newbuf.begin(); > blp.advance(offset); > } > } > bufferptr bp = blp.get_current_ptr(); > int read = MIN(bp.length(), left); > ldout(msgr->cct,20) > << "reader reading nonblocking into " > << (void*) bp.c_str() << " len " << bp.length() > << dendl; > int got = tcp_read_nonblocking(bp.c_str(), read); > ldout(msgr->cct,30) > << "reader read " << got << " of " << read << dendl; > connection_state->lock.Unlock(); > if (got < 0) > goto out_dethrottle; > if (got > 0) { > blp.advance(got); > data.append(bp, 0, got); > offset += got; > left -= got; > } // else we got a signal or something; just loop. > } > > As shown in the above code, in the reading loop, it first lock connection_state->lock and then do tcp_read_nonblocking. connection_state is of type PipeConnectionRef, connection_state->lock is Connection::lock. > > On the other hand, I'll check that whether there are a lot of message > to send as you suggested. Thanks:-) > > > > 发件人: Gregory Farnum [gfarnum@xxxxxxxxxx] > > 发送时间: 2017年1月14日 9:39 > > 收件人: 许雪寒 > > Cc: jiajia zhong; ceph-users@xxxxxxxxxxxxxx > > 主题: Re: 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5 > > > > > > > > > > On Thu, Jan 12, 2017 at 7:58 PM, 许雪寒 <xuxuehan@xxxxxx> wrote: > > > > > > > > > > > Thank you for your continuous helpJ. > > We are using hammer 0.94.5 version, and what I read is the version of the source code. > > However, on the other hand, if Pipe::do_recv do act as blocked, is it > reasonable for the Pipe::reader_thread to block threads calling SimpleMessenger::submit_message by holding Connection::lock? > > I think maybe a different mutex should be used in Pipe::read_message rather than Connection::lock. > > > > > > > > I don't think it does use that lock. Pipe::read_message() is generally > called while the pipe_lock is held, but not Connection::lock. (They > are separate.) > > I haven't dug into the relevant OSD code in a while, but I think it's > a lot more likely your OSD is just overloaded and is taking a while to send a lot of different messages, and that the loop it's in doesn't update the HeartbeatMap or something. Did you check that? > > -Greg > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com