RE: [RFC][PATCH] osd: Add local_connection to fast_dispatch in func _send_boot.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Hrm, I'd really like to see the startup sequence. I see the crash occurring, but I
> don't understand how it's happening — we test this pretty extensively so there
> must be something about your testing configuration that is different than ours.
> Can you provide that part of the log, and maybe a little more description of
> what you think the problem is?

If the ceph.conf contain "cluster addr",  the bug must occur.
For no "cluster addr" in ceph.conf, the local-connection add to fast-dispatch in func  _send_boot/ cluster_messenger->set_addr_unknowns.

> 
> In particular, we *always* call init_local_connection when the messenger
> starts, so every messenger who is allowed to receive EC messages should have
> the local connection set up before they get one.
Yes you call init_local_connection. But only adding osd to messenger, the local_conenction can add to dispatch.
In func OSD::init
	>>cluster_messenger->add_dispatcher_head(this);
Only after this, the local_connection can add to dispatch.
Because if local_connection has correct type, it can add to dispatch and don’t' care the cluster addr.
When allocate a Messenger, it set the type and only after add_dispatcher_head/tail, the local-connection can add to dispatch.
Maybe add ms_deliver_handle_fast_connect(local_connection.get())  in SimpleMessenger::ready is better.


Jianpeng Ma

> I don't really see how supplying the local connection as a new one in
> _send_boot *should* be fixing that, and it's not the place to do so (although I
> guess it's doing *something*, I just can't figure out what).


> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Wed, Jul 16, 2014 at 5:17 PM, Ma, Jianpeng <jianpeng.ma@xxxxxxxxx>
> wrote:
> > Hi Greg,
> >    The attachment is the log.
> >
> > Thanks!
> >
> > -----Original Message-----
> > From: Gregory Farnum [mailto:greg@xxxxxxxxxxx]
> > Sent: Thursday, July 17, 2014 3:41 AM
> > To: Ma, Jianpeng
> > Cc: ceph-devel@xxxxxxxxxxxxxxx
> > Subject: Re: [RFC][PATCH] osd: Add local_connection to fast_dispatch in func
> _send_boot.
> >
> > I'm looking at this and getting a little confused. Can you provide a
> > log of the crash occurring? (preferably with debug_ms=20,
> > debug_osd=20)
> > -Greg
> > Software Engineer #42 @ http://inktank.com | http://ceph.com
> >
> >
> > On Sun, Jul 13, 2014 at 8:17 PM, Ma, Jianpeng <jianpeng.ma@xxxxxxxxx>
> wrote:
> >> When do ec-read, i met a bug which was occured 100%. The messages are:
> >> 2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function
> >> 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700
> >> time
> >> 2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session)
> >>
> >>  ceph version 0.82-585-g79f3f67
> >> (79f3f6749122ce2944baa70541949d7ca75525e6)
> >>  1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6]
> >>  2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6]
> >>  3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b]
> >>  4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd]
> >>  5: (()+0x8182) [0x7f7665670182]
> >>  6: (clone()+0x6d) [0x7f7663a1130d]
> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
> >>
> >> In commit 69fc6b2b66, it enable fast_dispatch on local connections
> >> and it will add local_connection to fast_dispatch in func
> init_local_connection.
> >> But if there is no fast-dispatch, the local connection can't add.
> >>
> >> If there is no clutser addr in ceph.conf, it will add
> >> local_connection to fast dispatch in func _send_boot because the
> cluster_addr is empty.
> >> But if there is cluster addr, local_connection can't add to fast dispatch.
> >>
> >> For ECSubRead, it send to itself by func send_message_osd_cluster so
> >> it will cause this bug.
> >>
> >> I don't know about hb_back/front_server_messenger. But they are in
> >> _send_boot like cluster_messenger, so i also modified those.
> >>
> >> Signed-off-by: Ma Jianpeng <jianpeng.ma@xxxxxxxxx>
> >> ---
> >>  src/osd/OSD.cc | 14 +++++++++++---
> >>  1 file changed, 11 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 52a3839..75b294b
> >> 100644
> >> --- a/src/osd/OSD.cc
> >> +++ b/src/osd/OSD.cc
> >> @@ -3852,29 +3852,37 @@ void OSD::_send_boot()  {
> >>    dout(10) << "_send_boot" << dendl;
> >>    entity_addr_t cluster_addr = cluster_messenger->get_myaddr();
> >> +  Connection *local_connection =
> >> + cluster_messenger->get_loopback_connection().get();
> >>    if (cluster_addr.is_blank_ip()) {
> >>      int port = cluster_addr.get_port();
> >>      cluster_addr = client_messenger->get_myaddr();
> >>      cluster_addr.set_port(port);
> >>      cluster_messenger->set_addr_unknowns(cluster_addr);
> >>      dout(10) << " assuming cluster_addr ip matches client_addr" <<
> >> dendl;
> >> -  }
> >> +  } else if (local_connection->get_priv() == NULL)
> >> +
> >> + cluster_messenger->ms_deliver_handle_fast_connect(local_connection)
> >> + ;
> >> +
> >>    entity_addr_t hb_back_addr =
> >> hb_back_server_messenger->get_myaddr();
> >> +  local_connection =
> >> + hb_back_server_messenger->get_loopback_connection().get();
> >>    if (hb_back_addr.is_blank_ip()) {
> >>      int port = hb_back_addr.get_port();
> >>      hb_back_addr = cluster_addr;
> >>      hb_back_addr.set_port(port);
> >>      hb_back_server_messenger->set_addr_unknowns(hb_back_addr);
> >>      dout(10) << " assuming hb_back_addr ip matches cluster_addr" <<
> >> dendl;
> >> -  }
> >> +  } else if (local_connection->get_priv() == NULL)
> >> +
> >> +
> hb_back_server_messenger->ms_deliver_handle_fast_connect(local_conn
> >> + e
> >> + ction);
> >> +
> >>    entity_addr_t hb_front_addr =
> >> hb_front_server_messenger->get_myaddr();
> >> +  local_connection =
> >> + hb_front_server_messenger->get_loopback_connection().get();
> >>    if (hb_front_addr.is_blank_ip()) {
> >>      int port = hb_front_addr.get_port();
> >>      hb_front_addr = client_messenger->get_myaddr();
> >>      hb_front_addr.set_port(port);
> >>      hb_front_server_messenger->set_addr_unknowns(hb_front_addr);
> >>      dout(10) << " assuming hb_front_addr ip matches client_addr" <<
> >> dendl;
> >> -  }
> >> +  } else if (local_connection->get_priv() == NULL)
> >> +
> >> + hb_front_server_messenger->ms_deliver_handle_fast_connect(local_con
> >> + n
> >> + ection);
> >>
> >>    MOSDBoot *mboot = new MOSDBoot(superblock,
> service.get_boot_epoch(),
> >>                                   hb_back_addr, hb_front_addr,
> >> cluster_addr);
> >> --
> >> 1.9.1
> >>
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux