Re: Bluestore OSDs keep crashing in BlueStore.cc: 8808: FAILED assert(r == 0)

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Thu, 12 Sep 2019 19:20:23 +0200

Hello Igor,

i can now confirm that this is indeed a kernel bug. The issue does no
longer happen on upgraded nodes.

Do you know more about it? I really would like to know in which version
it was fixed to prevent rebooting all ceph nodes.

Greets,
Stefan

Am 27.08.19 um 16:20 schrieb Igor Fedotov:
> It sounds like OSD is "recovering" after checksum error.
> 
> I.e. just failed OSD shows no errors in fsck and is able to restart and
> process new write requests for long enough period (longer than just a
> couple of minutes). Are these statements true? If so I can suppose this
> is accidental/volatile issue rather than data-at-rest corruption.
> Something like data incorrectly read from disk.
> 
> Are you using standalone disk drive for DB/WAL or it's shared with main
> one? Just in case as a low handing fruit - I'd suggest checking with
> dmesg and smartctl for drive errors...
> 
> FYI: one more reference for the similar issue:
> https://tracker.ceph.com/issues/24968
> 
> HW issue this time...
> 
> 
> Also I recall an issue with some kernels that caused occasional invalid
> data reads under high memory pressure/swapping:
> https://tracker.ceph.com/issues/22464
> 
> IMO memory usage worth checking as well...
> 
> 
> Igor
> 
> 
> On 8/27/2019 4:52 PM, Stefan Priebe - Profihost AG wrote:
>> see inline
>>
>> Am 27.08.19 um 15:43 schrieb Igor Fedotov:
>>> see inline
>>>
>>> On 8/27/2019 4:41 PM, Stefan Priebe - Profihost AG wrote:
>>>> Hi Igor,
>>>>
>>>> Am 27.08.19 um 14:11 schrieb Igor Fedotov:
>>>>> Hi Stefan,
>>>>>
>>>>> this looks like a duplicate for
>>>>>
>>>>> https://tracker.ceph.com/issues/37282
>>>>>
>>>>> Actually the root cause selection might be quite wide.
>>>>>
>>>>>   From HW issues to broken logic in RocksDB/BlueStore/BlueFS etc.
>>>>>
>>>>> As far as I understand you have different OSDs which are failing,
>>>>> right?
>>>> Yes i've seen this on around 50 different OSDs running different HW but
>>>> all run ceph 12.2.12. I've not seen this with 12.2.10 which we were
>>>> running before.
>>>>
>>>>> Is the set of these broken OSDs limited somehow?
>>>> No at least i'm not able to find
>>>>
>>>>
>>>>> Any specific subset which is failing or something? E.g. just N of them
>>>>> are failing from time to time.
>>>> No seems totally random.
>>>>
>>>>> Any similarities for broken OSDs (e.g. specific hardware)?
>>>> All run intel xeon CPUs and all run linux ;-)
>>>>
>>>>> Did you run fsck for any of broken OSDs? Any reports?
>>>> Yes but no reports.
>>> Are you saying that fsck is fine for OSDs that showed this sort of
>>> errors?
>> Yes fsck does not show a single error - everything is fine.
>>
>>>>> Any other errors/crashes in logs before these sort of issues happens?
>>>> No
>>>>
>>>>
>>>>> Just in case - what allocator are you using?
>>>> tcmalloc
>>> I meant BlueStore allocator - is it stupid or bitmap?
>> ah the default one i think this is stupid.
>>
>> Greets,
>> Stefan
>>
>>>> Greets,
>>>> Stefan
>>>>
>>>>> Thanks,
>>>>>
>>>>> Igor
>>>>>
>>>>>
>>>>>
>>>>> On 8/27/2019 1:03 PM, Stefan Priebe - Profihost AG wrote:
>>>>>> Hello,
>>>>>>
>>>>>> since some month all our bluestore OSDs keep crashing from time to
>>>>>> time.
>>>>>> Currently about 5 OSDs per day.
>>>>>>
>>>>>> All of them show the following trace:
>>>>>> Trace:
>>>>>> 2019-07-24 08:36:48.995397 7fb19a711700 -1 rocksdb:
>>>>>> submit_transaction
>>>>>> error: Corruption: block checksum mismatch code = 2 Rocksdb
>>>>>> transaction:
>>>>>> Put( Prefix = M key =
>>>>>> 0x00000000000009a5'.0000916366.00000000000074680351' Value size =
>>>>>> 184)
>>>>>> Put( Prefix = M key = 0x00000000000009a5'._fastinfo' Value size =
>>>>>> 186)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff6f00240000'x'
>>>>>>
>>>>>>
>>>>>>
>>>>>> Value size = 530)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff'o'
>>>>>>
>>>>>>
>>>>>>
>>>>>> Value size = 510)
>>>>>> Put( Prefix = L key = 0x0000000010ba60f1 Value size = 4135)
>>>>>> 2019-07-24 08:36:49.012110 7fb19a711700 -1
>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
>>>>>> BlueStore::_kv_sync_thread()' thread 7fb19a711700 time 2019-07-24
>>>>>> 08:36:48.995415
>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r
>>>>>> == 0)
>>>>>>
>>>>>> ceph version 12.2.12-7-g1321c5e91f
>>>>>> (1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable)
>>>>>>     1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>> const*)+0x102) [0x5653a010e222]
>>>>>>     2: (BlueStore::_kv_sync_thread()+0x24c5) [0x56539ff964b5]
>>>>>>     3: (BlueStore::KVSyncThread::entry()+0xd) [0x56539ffd708d]
>>>>>>     4: (()+0x7494) [0x7fb1ab2f6494]
>>>>>>     5: (clone()+0x3f) [0x7fb1aa37dacf]
>>>>>>
>>>>>> I already opend up a tracker:
>>>>>> https://tracker.ceph.com/issues/41367
>>>>>>
>>>>>> Can anybody help? Is this known?
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com