Re: [RFC][PATCH] osd: Add local_connection to fast_dispatch in func _send_boot.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hrm, I'd really like to see the startup sequence. I see the crash
occurring, but I don't understand how it's happening — we test this
pretty extensively so there must be something about your testing
configuration that is different than ours. Can you provide that part
of the log, and maybe a little more description of what you think the
problem is?

In particular, we *always* call init_local_connection when the
messenger starts, so every messenger who is allowed to receive EC
messages should have the local connection set up before they get one.
I don't really see how supplying the local connection as a new one in
_send_boot *should* be fixing that, and it's not the place to do so
(although I guess it's doing *something*, I just can't figure out
what).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Jul 16, 2014 at 5:17 PM, Ma, Jianpeng <jianpeng.ma@xxxxxxxxx> wrote:
> Hi Greg,
>    The attachment is the log.
>
> Thanks!
>
> -----Original Message-----
> From: Gregory Farnum [mailto:greg@xxxxxxxxxxx]
> Sent: Thursday, July 17, 2014 3:41 AM
> To: Ma, Jianpeng
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: [RFC][PATCH] osd: Add local_connection to fast_dispatch in func _send_boot.
>
> I'm looking at this and getting a little confused. Can you provide a log of the crash occurring? (preferably with debug_ms=20,
> debug_osd=20)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Sun, Jul 13, 2014 at 8:17 PM, Ma, Jianpeng <jianpeng.ma@xxxxxxxxx> wrote:
>> When do ec-read, i met a bug which was occured 100%. The messages are:
>> 2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function
>> 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700
>> time
>> 2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session)
>>
>>  ceph version 0.82-585-g79f3f67
>> (79f3f6749122ce2944baa70541949d7ca75525e6)
>>  1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6]
>>  2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6]
>>  3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b]
>>  4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd]
>>  5: (()+0x8182) [0x7f7665670182]
>>  6: (clone()+0x6d) [0x7f7663a1130d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>
>> In commit 69fc6b2b66, it enable fast_dispatch on local connections and
>> it will add local_connection to fast_dispatch in func init_local_connection.
>> But if there is no fast-dispatch, the local connection can't add.
>>
>> If there is no clutser addr in ceph.conf, it will add local_connection
>> to fast dispatch in func _send_boot because the cluster_addr is empty.
>> But if there is cluster addr, local_connection can't add to fast dispatch.
>>
>> For ECSubRead, it send to itself by func send_message_osd_cluster so
>> it will cause this bug.
>>
>> I don't know about hb_back/front_server_messenger. But they are in
>> _send_boot like cluster_messenger, so i also modified those.
>>
>> Signed-off-by: Ma Jianpeng <jianpeng.ma@xxxxxxxxx>
>> ---
>>  src/osd/OSD.cc | 14 +++++++++++---
>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 52a3839..75b294b
>> 100644
>> --- a/src/osd/OSD.cc
>> +++ b/src/osd/OSD.cc
>> @@ -3852,29 +3852,37 @@ void OSD::_send_boot()  {
>>    dout(10) << "_send_boot" << dendl;
>>    entity_addr_t cluster_addr = cluster_messenger->get_myaddr();
>> +  Connection *local_connection =
>> + cluster_messenger->get_loopback_connection().get();
>>    if (cluster_addr.is_blank_ip()) {
>>      int port = cluster_addr.get_port();
>>      cluster_addr = client_messenger->get_myaddr();
>>      cluster_addr.set_port(port);
>>      cluster_messenger->set_addr_unknowns(cluster_addr);
>>      dout(10) << " assuming cluster_addr ip matches client_addr" <<
>> dendl;
>> -  }
>> +  } else if (local_connection->get_priv() == NULL)
>> +
>> + cluster_messenger->ms_deliver_handle_fast_connect(local_connection);
>> +
>>    entity_addr_t hb_back_addr =
>> hb_back_server_messenger->get_myaddr();
>> +  local_connection =
>> + hb_back_server_messenger->get_loopback_connection().get();
>>    if (hb_back_addr.is_blank_ip()) {
>>      int port = hb_back_addr.get_port();
>>      hb_back_addr = cluster_addr;
>>      hb_back_addr.set_port(port);
>>      hb_back_server_messenger->set_addr_unknowns(hb_back_addr);
>>      dout(10) << " assuming hb_back_addr ip matches cluster_addr" <<
>> dendl;
>> -  }
>> +  } else if (local_connection->get_priv() == NULL)
>> +
>> + hb_back_server_messenger->ms_deliver_handle_fast_connect(local_conne
>> + ction);
>> +
>>    entity_addr_t hb_front_addr =
>> hb_front_server_messenger->get_myaddr();
>> +  local_connection =
>> + hb_front_server_messenger->get_loopback_connection().get();
>>    if (hb_front_addr.is_blank_ip()) {
>>      int port = hb_front_addr.get_port();
>>      hb_front_addr = client_messenger->get_myaddr();
>>      hb_front_addr.set_port(port);
>>      hb_front_server_messenger->set_addr_unknowns(hb_front_addr);
>>      dout(10) << " assuming hb_front_addr ip matches client_addr" <<
>> dendl;
>> -  }
>> +  } else if (local_connection->get_priv() == NULL)
>> +
>> + hb_front_server_messenger->ms_deliver_handle_fast_connect(local_conn
>> + ection);
>>
>>    MOSDBoot *mboot = new MOSDBoot(superblock, service.get_boot_epoch(),
>>                                   hb_back_addr, hb_front_addr,
>> cluster_addr);
>> --
>> 1.9.1
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux