On Sun, Jul 20, 2014 at 11:33 PM, Ma, Jianpeng <jianpeng.ma@xxxxxxxxx> wrote: >> Hrm, I'd really like to see the startup sequence. I see the crash occurring, but I >> don't understand how it's happening — we test this pretty extensively so there >> must be something about your testing configuration that is different than ours. >> Can you provide that part of the log, and maybe a little more description of >> what you think the problem is? > > If the ceph.conf contain "cluster addr", the bug must occur. > For no "cluster addr" in ceph.conf, the local-connection add to fast-dispatch in func _send_boot/ cluster_messenger->set_addr_unknowns. > >> >> In particular, we *always* call init_local_connection when the messenger >> starts, so every messenger who is allowed to receive EC messages should have >> the local connection set up before they get one. > Yes you call init_local_connection. But only adding osd to messenger, the local_conenction can add to dispatch. > In func OSD::init > >>cluster_messenger->add_dispatcher_head(this); > Only after this, the local_connection can add to dispatch. > Because if local_connection has correct type, it can add to dispatch and don’t' care the cluster addr. > When allocate a Messenger, it set the type and only after add_dispatcher_head/tail, the local-connection can add to dispatch. > Maybe add ms_deliver_handle_fast_connect(local_connection.get()) in SimpleMessenger::ready is better. Ooooookay, I see the problem now. I pulled the patch (with some wording changes) into master at commit 9061988ec7eaa922e2b303d9eece86e7c8ee0fa1. I've also created a ticket to clean up the local dispatch Connection setup at http://tracker.ceph.com/issues/8892. Thanks! -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html