> Hrm, I'd really like to see the startup sequence. I see the crash occurring, but I > don't understand how it's happening — we test this pretty extensively so there > must be something about your testing configuration that is different than ours. > Can you provide that part of the log, and maybe a little more description of > what you think the problem is? If the ceph.conf contain "cluster addr", the bug must occur. For no "cluster addr" in ceph.conf, the local-connection add to fast-dispatch in func _send_boot/ cluster_messenger->set_addr_unknowns. > > In particular, we *always* call init_local_connection when the messenger > starts, so every messenger who is allowed to receive EC messages should have > the local connection set up before they get one. Yes you call init_local_connection. But only adding osd to messenger, the local_conenction can add to dispatch. In func OSD::init >>cluster_messenger->add_dispatcher_head(this); Only after this, the local_connection can add to dispatch. Because if local_connection has correct type, it can add to dispatch and don’t' care the cluster addr. When allocate a Messenger, it set the type and only after add_dispatcher_head/tail, the local-connection can add to dispatch. Maybe add ms_deliver_handle_fast_connect(local_connection.get()) in SimpleMessenger::ready is better. Jianpeng Ma > I don't really see how supplying the local connection as a new one in > _send_boot *should* be fixing that, and it's not the place to do so (although I > guess it's doing *something*, I just can't figure out what). > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Wed, Jul 16, 2014 at 5:17 PM, Ma, Jianpeng <jianpeng.ma@xxxxxxxxx> > wrote: > > Hi Greg, > > The attachment is the log. > > > > Thanks! > > > > -----Original Message----- > > From: Gregory Farnum [mailto:greg@xxxxxxxxxxx] > > Sent: Thursday, July 17, 2014 3:41 AM > > To: Ma, Jianpeng > > Cc: ceph-devel@xxxxxxxxxxxxxxx > > Subject: Re: [RFC][PATCH] osd: Add local_connection to fast_dispatch in func > _send_boot. > > > > I'm looking at this and getting a little confused. Can you provide a > > log of the crash occurring? (preferably with debug_ms=20, > > debug_osd=20) > > -Greg > > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > > > > On Sun, Jul 13, 2014 at 8:17 PM, Ma, Jianpeng <jianpeng.ma@xxxxxxxxx> > wrote: > >> When do ec-read, i met a bug which was occured 100%. The messages are: > >> 2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function > >> 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700 > >> time > >> 2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session) > >> > >> ceph version 0.82-585-g79f3f67 > >> (79f3f6749122ce2944baa70541949d7ca75525e6) > >> 1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6] > >> 2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6] > >> 3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b] > >> 4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd] > >> 5: (()+0x8182) [0x7f7665670182] > >> 6: (clone()+0x6d) [0x7f7663a1130d] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > >> > >> In commit 69fc6b2b66, it enable fast_dispatch on local connections > >> and it will add local_connection to fast_dispatch in func > init_local_connection. > >> But if there is no fast-dispatch, the local connection can't add. > >> > >> If there is no clutser addr in ceph.conf, it will add > >> local_connection to fast dispatch in func _send_boot because the > cluster_addr is empty. > >> But if there is cluster addr, local_connection can't add to fast dispatch. > >> > >> For ECSubRead, it send to itself by func send_message_osd_cluster so > >> it will cause this bug. > >> > >> I don't know about hb_back/front_server_messenger. But they are in > >> _send_boot like cluster_messenger, so i also modified those. > >> > >> Signed-off-by: Ma Jianpeng <jianpeng.ma@xxxxxxxxx> > >> --- > >> src/osd/OSD.cc | 14 +++++++++++--- > >> 1 file changed, 11 insertions(+), 3 deletions(-) > >> > >> diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 52a3839..75b294b > >> 100644 > >> --- a/src/osd/OSD.cc > >> +++ b/src/osd/OSD.cc > >> @@ -3852,29 +3852,37 @@ void OSD::_send_boot() { > >> dout(10) << "_send_boot" << dendl; > >> entity_addr_t cluster_addr = cluster_messenger->get_myaddr(); > >> + Connection *local_connection = > >> + cluster_messenger->get_loopback_connection().get(); > >> if (cluster_addr.is_blank_ip()) { > >> int port = cluster_addr.get_port(); > >> cluster_addr = client_messenger->get_myaddr(); > >> cluster_addr.set_port(port); > >> cluster_messenger->set_addr_unknowns(cluster_addr); > >> dout(10) << " assuming cluster_addr ip matches client_addr" << > >> dendl; > >> - } > >> + } else if (local_connection->get_priv() == NULL) > >> + > >> + cluster_messenger->ms_deliver_handle_fast_connect(local_connection) > >> + ; > >> + > >> entity_addr_t hb_back_addr = > >> hb_back_server_messenger->get_myaddr(); > >> + local_connection = > >> + hb_back_server_messenger->get_loopback_connection().get(); > >> if (hb_back_addr.is_blank_ip()) { > >> int port = hb_back_addr.get_port(); > >> hb_back_addr = cluster_addr; > >> hb_back_addr.set_port(port); > >> hb_back_server_messenger->set_addr_unknowns(hb_back_addr); > >> dout(10) << " assuming hb_back_addr ip matches cluster_addr" << > >> dendl; > >> - } > >> + } else if (local_connection->get_priv() == NULL) > >> + > >> + > hb_back_server_messenger->ms_deliver_handle_fast_connect(local_conn > >> + e > >> + ction); > >> + > >> entity_addr_t hb_front_addr = > >> hb_front_server_messenger->get_myaddr(); > >> + local_connection = > >> + hb_front_server_messenger->get_loopback_connection().get(); > >> if (hb_front_addr.is_blank_ip()) { > >> int port = hb_front_addr.get_port(); > >> hb_front_addr = client_messenger->get_myaddr(); > >> hb_front_addr.set_port(port); > >> hb_front_server_messenger->set_addr_unknowns(hb_front_addr); > >> dout(10) << " assuming hb_front_addr ip matches client_addr" << > >> dendl; > >> - } > >> + } else if (local_connection->get_priv() == NULL) > >> + > >> + hb_front_server_messenger->ms_deliver_handle_fast_connect(local_con > >> + n > >> + ection); > >> > >> MOSDBoot *mboot = new MOSDBoot(superblock, > service.get_boot_epoch(), > >> hb_back_addr, hb_front_addr, > >> cluster_addr); > >> -- > >> 1.9.1 > >> ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f