Re: Luminous Bluestore OSDs crashing with ASSERT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, except that this happens on 8 different clusters with different hw but same ceph version and same kernel version.

Greets,
Stefan

> Am 19.01.2020 um 11:53 schrieb Igor Fedotov <ifedotov@xxxxxxx>:
> 
> So the intermediate summary is:
> 
> Any OSD in the cluster can experience interim RocksDB checksum failure. Which isn't present after OSD restart.
> 
> No HW issues observed, no persistent artifacts (except OSD log) afterwards.
> 
> And looks like the issue is rather specific to the cluster as no similar reports from other users seem to be present.
> 
> 
> Sorry, I'm out of ideas other then collect all the failure logs and try to find something common in them. May be this will shed some light..
> 
> BTW from my experience it might make sense to inspect OSD log prior to failure (any error messages and/or prior restarts, etc) sometimes this might provide some hints.
> 
> 
> Thanks,
> 
> Igor
> 
> 
>> On 1/17/2020 2:30 PM, Stefan Priebe - Profihost AG wrote:
>> HI Igor,
>> 
>>> Am 17.01.20 um 12:10 schrieb Igor Fedotov:
>>> hmmm..
>>> 
>>> Just in case - suggest to check H/W errors with dmesg.
>> this happens on around 80 nodes - i don't expect all of those have not
>> identified hw errors. Also all of them are monitored - no dmesg outpout
>> contains any errors.
>> 
>>> Also there are some (not very much though) chances this is another
>>> incarnation of the following bug:
>>> https://tracker.ceph.com/issues/22464
>>> https://github.com/ceph/ceph/pull/24649
>>> 
>>> The corresponding PR works around it for main device reads (user data
>>> only!) but theoretically it might still happen
>>> 
>>> either for DB device or DB data at main device.
>>> 
>>> Can you observe any bluefs spillovers? Are there any correlation between
>>> failing OSDs and spillover presence if any, e.g. failing OSDs always
>>> have a spillover. While OSDs without spillovers never face the issue...
>>> 
>>> To validate this hypothesis one can try to monitor/check (e.g. once a
>>> day for a week or something) "bluestore_reads_with_retries" counter over
>>> OSDs to learn if the issue is happening
>>> 
>>> in the system.  Non-zero values mean it's there for user data/main
>>> device and hence is likely to happen for DB ones as well (which doesn't
>>> have any workaround yet).
>> OK i checked bluestore_reads_with_retries on 360 osds but all of them say 0.
>> 
>> 
>>> Additionally you might want to monitor memory usage as the above
>>> mentioned PR denotes high memory pressure as potential trigger for these
>>> read errors. So if such pressure happens the hypothesis becomes more valid.
>> we already do this heavily and have around 10GB of memory per OSD. Also
>> no of those machines show any io pressure at all.
>> 
>> All hosts show a constant rate of around 38GB to 45GB mem available in
>> /proc/meminfo.
>> 
>> Stefan
>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> PS. Everything above is rather a speculation for now. Available
>>> information is definitely not enough for extensive troubleshooting the
>>> cases which happens that rarely.
>>> 
>>> You might want to start collecting failure-related information
>>> (including but not limited to failure logs, perf counter dumps, system
>>> resource reports etc) for future analysis.
>>> 
>>> 
>>> 
>>> On 1/16/2020 11:58 PM, Stefan Priebe - Profihost AG wrote:
>>>> Hi Igor,
>>>> 
>>>> answers inline.
>>>> 
>>>> Am 16.01.20 um 21:34 schrieb Igor Fedotov:
>>>>> you may want to run fsck against failing OSDs. Hopefully it will shed
>>>>> some light.
>>>> fsck just says everything fine:
>>>> 
>>>> # ceph-bluestore-tool --command fsck --path /var/lib/ceph/osd/ceph-27/
>>>> fsck success
>>>> 
>>>> 
>>>>> Also wondering if OSD is able to recover (startup and proceed working)
>>>>> after facing the issue?
>>>> no recover needed. It just runs forever after restarting.
>>>> 
>>>>> If so do you have any one which failed multiple times? Do you have logs
>>>>> for these occurrences?
>>>> may be but there are most probably weeks or month between those failures
>>>> - most probably logs are already deleted.
>>>> 
>>>>> Also please note that patch you mentioned doesn't fix previous issues
>>>>> (i.e. duplicate allocations), it prevents from new ones only.
>>>>> 
>>>>> But fsck should show them if any...
>>>> None showed.
>>>> 
>>>> Stefan
>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Igor
>>>>> 
>>>>> 
>>>>> 
>>>>> On 1/16/2020 10:04 PM, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Igor,
>>>>>> 
>>>>>> ouch sorry. Here we go:
>>>>>> 
>>>>>>      -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb:
>>>>>> submit_transaction error: Corruption: block checksum mismatch code = 2
>>>>>> Rocksdb transaction:
>>>>>> Put( Prefix = M key =
>>>>>> 0x0000000000000402'.OBJ_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..'
>>>>>> 
>>>>>> 
>>>>>> Value size = 97)
>>>>>> Put( Prefix = M key =
>>>>>> 0x0000000000000402'.MAP_00000000000BB85C_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..'
>>>>>> 
>>>>>> 
>>>>>> Value size = 93)
>>>>>> Put( Prefix = M key =
>>>>>> 0x0000000000000916'.0000823257.00000000000073922044' Value size = 196)
>>>>>> Put( Prefix = M key =
>>>>>> 0x0000000000000916'.0000823257.00000000000073922045' Value size = 184)
>>>>>> Put( Prefix = M key = 0x0000000000000916'._info' Value size = 899)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00000000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 418)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00030000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 474)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0007c000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 392)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00090000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 317)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000a0000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 521)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000f4000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 558)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00130000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 649)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00194000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 449)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f001cc000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 580)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00200000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 435)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00240000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 569)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00290000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 465)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f002e0000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 710)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00300000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 599)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0036c000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 372)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003a6000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 130)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003b4000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 540)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003fc000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 47)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff'o'
>>>>>> 
>>>>>> 
>>>>>> Value size = 1731)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00040000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 675)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00080000'x'
>>>>>> 
>>>>>> 
>>>>>> Value size = 395)
>>>>>> Put( Prefix = O key =
>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff'o'
>>>>>> 
>>>>>> 
>>>>>> Value size = 1328)
>>>>>> Put( Prefix = X key = 0x0000000018a38deb Value size = 14)
>>>>>> Put( Prefix = X key = 0x0000000018a38dea Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a035b Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a035c Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a0355 Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a0356 Value size = 17)
>>>>>> Put( Prefix = X key = 0x000000001a54f6e4 Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000001b1c061e Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a038f Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a0389 Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a0358 Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a035f Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a0357 Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a0387 Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a038a Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a0388 Value size = 14)
>>>>>> Put( Prefix = X key = 0x00000000134c3fbe Value size = 14)
>>>>>> Put( Prefix = X key = 0x00000000134c3fb5 Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a036e Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a036d Value size = 14)
>>>>>> Put( Prefix = X key = 0x00000000134c3fb8 Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a0371 Value size = 14)
>>>>>> Put( Prefix = X key = 0x000000000d7a036a Value size = 14)
>>>>>>        0> 2020-01-16 01:10:13.413759 7f3350a14700 -1
>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
>>>>>> BlueStore::_kv_sync_thread()' thread 7f3350a14700 time 2020-01-16
>>>>>> 01:10:13.404113
>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)
>>>>>> 
>>>>>>    ceph version 12.2.12-11-gd3eae83543
>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable)
>>>>>>    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>> const*)+0x102) [0x55c9a712d232]
>>>>>>    2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55c9a6fb54b5]
>>>>>>    3: (BlueStore::KVSyncThread::entry()+0xd) [0x55c9a6ff608d]
>>>>>>    4: (()+0x7494) [0x7f33615f9494]
>>>>>>    5: (clone()+0x3f) [0x7f3360680acf]
>>>>>> 
>>>>>> I already picked those:
>>>>>> https://github.com/ceph/ceph/pull/28644
>>>>>> 
>>>>>> Greets,
>>>>>> Stefan
>>>>>> Am 16.01.20 um 17:00 schrieb Igor Fedotov:
>>>>>>> Hi Stefan,
>>>>>>> 
>>>>>>> would you please share log snippet prior the assertions? Looks like
>>>>>>> RocksDB is failing during transaction submission...
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Igor
>>>>>>> 
>>>>>>> On 1/16/2020 11:56 AM, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> does anybody know a fix for this ASSERT / crash?
>>>>>>>> 
>>>>>>>> 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1
>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
>>>>>>>> BlueStore::_kv_sync_thread()' thread 7f8c3f5ab700 time 2020-01-16
>>>>>>>> 02:02:31.304993
>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r
>>>>>>>> == 0)
>>>>>>>> 
>>>>>>>>     ceph version 12.2.12-11-gd3eae83543
>>>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable)
>>>>>>>>     1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x55e6df9d9232]
>>>>>>>>     2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55e6df8614b5]
>>>>>>>>     3: (BlueStore::KVSyncThread::entry()+0xd) [0x55e6df8a208d]
>>>>>>>>     4: (()+0x7494) [0x7f8c50190494]
>>>>>>>>     5: (clone()+0x3f) [0x7f8c4f217acf]
>>>>>>>> 
>>>>>>>> all bluestore OSDs are randomly crashing sometimes (once a week).
>>>>>>>> 
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux