Re: Assertion error in librados

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Looks to me like we try to send a message in the handle_osd_map when
we are still under the lock that we try to grab.

Yehuda

On Tue, Feb 25, 2014 at 7:28 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> Do you have logs? The assert indicates that the messenger got back
> something other than "okay" when trying to grab a local Mutex, which
> shouldn't be able to happen. It may be that some error-handling path
> didn't drop it (within the same thread that later tried to grab it
> again), but we'll need more details to track it down.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Feb 25, 2014 at 6:49 AM, Filippos Giannakos <philipgian@xxxxxxxx> wrote:
>> Hello all,
>>
>> We recently bumped into the following assertion error in librados on our
>> production service:
>>
>>
>> common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fa2c2ccf700 time 2014-02-21 07:23:26.340791
>> common/Mutex.cc: 93: FAILED assert(r == 0)
>>  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>>  1: (Mutex::Lock(bool)+0x131) [0x7fa2c7707431]
>>  2: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x52) [0x7fa2c7863172]
>>  3: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x23e) [0x7fa2c7863bfe]
>>  4: (Objecter::send_op(Objecter::Op*)+0x32c) [0x7fa2c76b317c]
>>  5: (Objecter::handle_osd_map(MOSDMap*)+0x365) [0x7fa2c76b7805]
>>  6: (librados::RadosClient::_dispatch(Message*)+0x7c) [0x7fa2c768c70c]
>>  7: (librados::RadosClient::ms_dispatch(Message*)+0x9b) [0x7fa2c768c82b]
>>  8: (DispatchQueue::entry()+0x4eb) [0x7fa2c7800d2b]
>>  9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fa2c78666ad]
>>  10: (()+0x6b50) [0x7fa2c7203b50]
>>  11: (clone()+0x6d) [0x7fa2c6b570ed]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>>
>>
>> From what I can tell, there were some network problems on our RADOS cluster,
>> after which many of our librados clients failed with the above assertion error.
>>
>> Do you have any ideas of what might went wrong ?
>>
>> Kind Regards,
>> --
>> Filippos
>> <philipgian@xxxxxxxx>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux