Re: Luminous Bluestore OSDs crashing with ASSERT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Igor,

answers inline.

Am 16.01.20 um 21:34 schrieb Igor Fedotov:
> you may want to run fsck against failing OSDs. Hopefully it will shed
> some light.

fsck just says everything fine:

# ceph-bluestore-tool --command fsck --path /var/lib/ceph/osd/ceph-27/
fsck success


> Also wondering if OSD is able to recover (startup and proceed working)
> after facing the issue?

no recover needed. It just runs forever after restarting.

> If so do you have any one which failed multiple times? Do you have logs
> for these occurrences?

may be but there are most probably weeks or month between those failures
- most probably logs are already deleted.

> Also please note that patch you mentioned doesn't fix previous issues
> (i.e. duplicate allocations), it prevents from new ones only.
> 
> But fsck should show them if any...

None showed.

Stefan

> Thanks,
> 
> Igor
> 
> 
> 
> On 1/16/2020 10:04 PM, Stefan Priebe - Profihost AG wrote:
>> Hi Igor,
>>
>> ouch sorry. Here we go:
>>
>>     -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb:
>> submit_transaction error: Corruption: block checksum mismatch code = 2
>> Rocksdb transaction:
>> Put( Prefix = M key =
>> 0x0000000000000402'.OBJ_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..'
>>
>> Value size = 97)
>> Put( Prefix = M key =
>> 0x0000000000000402'.MAP_00000000000BB85C_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..'
>>
>> Value size = 93)
>> Put( Prefix = M key =
>> 0x0000000000000916'.0000823257.00000000000073922044' Value size = 196)
>> Put( Prefix = M key =
>> 0x0000000000000916'.0000823257.00000000000073922045' Value size = 184)
>> Put( Prefix = M key = 0x0000000000000916'._info' Value size = 899)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00000000'x'
>>
>> Value size = 418)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00030000'x'
>>
>> Value size = 474)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0007c000'x'
>>
>> Value size = 392)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00090000'x'
>>
>> Value size = 317)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000a0000'x'
>>
>> Value size = 521)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000f4000'x'
>>
>> Value size = 558)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00130000'x'
>>
>> Value size = 649)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00194000'x'
>>
>> Value size = 449)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f001cc000'x'
>>
>> Value size = 580)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00200000'x'
>>
>> Value size = 435)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00240000'x'
>>
>> Value size = 569)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00290000'x'
>>
>> Value size = 465)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f002e0000'x'
>>
>> Value size = 710)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00300000'x'
>>
>> Value size = 599)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0036c000'x'
>>
>> Value size = 372)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003a6000'x'
>>
>> Value size = 130)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003b4000'x'
>>
>> Value size = 540)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003fc000'x'
>>
>> Value size = 47)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff'o'
>>
>> Value size = 1731)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00040000'x'
>>
>> Value size = 675)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00080000'x'
>>
>> Value size = 395)
>> Put( Prefix = O key =
>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff'o'
>>
>> Value size = 1328)
>> Put( Prefix = X key = 0x0000000018a38deb Value size = 14)
>> Put( Prefix = X key = 0x0000000018a38dea Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a035b Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a035c Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a0355 Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a0356 Value size = 17)
>> Put( Prefix = X key = 0x000000001a54f6e4 Value size = 14)
>> Put( Prefix = X key = 0x000000001b1c061e Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a038f Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a0389 Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a0358 Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a035f Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a0357 Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a0387 Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a038a Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a0388 Value size = 14)
>> Put( Prefix = X key = 0x00000000134c3fbe Value size = 14)
>> Put( Prefix = X key = 0x00000000134c3fb5 Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a036e Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a036d Value size = 14)
>> Put( Prefix = X key = 0x00000000134c3fb8 Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a0371 Value size = 14)
>> Put( Prefix = X key = 0x000000000d7a036a Value size = 14)
>>       0> 2020-01-16 01:10:13.413759 7f3350a14700 -1
>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
>> BlueStore::_kv_sync_thread()' thread 7f3350a14700 time 2020-01-16
>> 01:10:13.404113
>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)
>>
>>   ceph version 12.2.12-11-gd3eae83543
>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable)
>>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x55c9a712d232]
>>   2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55c9a6fb54b5]
>>   3: (BlueStore::KVSyncThread::entry()+0xd) [0x55c9a6ff608d]
>>   4: (()+0x7494) [0x7f33615f9494]
>>   5: (clone()+0x3f) [0x7f3360680acf]
>>
>> I already picked those:
>> https://github.com/ceph/ceph/pull/28644
>>
>> Greets,
>> Stefan
>> Am 16.01.20 um 17:00 schrieb Igor Fedotov:
>>> Hi Stefan,
>>>
>>> would you please share log snippet prior the assertions? Looks like
>>> RocksDB is failing during transaction submission...
>>>
>>>
>>> Thanks,
>>>
>>> Igor
>>>
>>> On 1/16/2020 11:56 AM, Stefan Priebe - Profihost AG wrote:
>>>> Hello,
>>>>
>>>> does anybody know a fix for this ASSERT / crash?
>>>>
>>>> 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1
>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
>>>> BlueStore::_kv_sync_thread()' thread 7f8c3f5ab700 time 2020-01-16
>>>> 02:02:31.304993
>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)
>>>>
>>>>    ceph version 12.2.12-11-gd3eae83543
>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable)
>>>>    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x102) [0x55e6df9d9232]
>>>>    2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55e6df8614b5]
>>>>    3: (BlueStore::KVSyncThread::entry()+0xd) [0x55e6df8a208d]
>>>>    4: (()+0x7494) [0x7f8c50190494]
>>>>    5: (clone()+0x3f) [0x7f8c4f217acf]
>>>>
>>>> all bluestore OSDs are randomly crashing sometimes (once a week).
>>>>
>>>> Greets,
>>>> Stefan
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux