Re: Luminous Bluestore OSDs crashing with ASSERT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Igor,

there's absolutely nothing in the logs before.

What do those lines mean:
Put( Prefix = O key =
0x7f8000000000000001cc45c881217262'd_data.4303206b8b4567.0000000000009632!='0xfffffffffffffffeffffffffffffffff6f00120000'x'
Value size = 480)
Put( Prefix = O key =
0x7f8000000000000001cc45c881217262'd_data.4303206b8b4567.0000000000009632!='0xfffffffffffffffeffffffffffffffff'o'
Value size = 510)

on the right size i always see 0xfffffffffffffffeffffffffffffffff on all
failed OSDs.

greets,
Stefan
Am 19.01.20 um 14:07 schrieb Stefan Priebe - Profihost AG:
> Yes, except that this happens on 8 different clusters with different hw but same ceph version and same kernel version.
> 
> Greets,
> Stefan
> 
>> Am 19.01.2020 um 11:53 schrieb Igor Fedotov <ifedotov@xxxxxxx>:
>>
>> So the intermediate summary is:
>>
>> Any OSD in the cluster can experience interim RocksDB checksum failure. Which isn't present after OSD restart.
>>
>> No HW issues observed, no persistent artifacts (except OSD log) afterwards.
>>
>> And looks like the issue is rather specific to the cluster as no similar reports from other users seem to be present.
>>
>>
>> Sorry, I'm out of ideas other then collect all the failure logs and try to find something common in them. May be this will shed some light..
>>
>> BTW from my experience it might make sense to inspect OSD log prior to failure (any error messages and/or prior restarts, etc) sometimes this might provide some hints.
>>
>>
>> Thanks,
>>
>> Igor
>>
>>
>>> On 1/17/2020 2:30 PM, Stefan Priebe - Profihost AG wrote:
>>> HI Igor,
>>>
>>>> Am 17.01.20 um 12:10 schrieb Igor Fedotov:
>>>> hmmm..
>>>>
>>>> Just in case - suggest to check H/W errors with dmesg.
>>> this happens on around 80 nodes - i don't expect all of those have not
>>> identified hw errors. Also all of them are monitored - no dmesg outpout
>>> contains any errors.
>>>
>>>> Also there are some (not very much though) chances this is another
>>>> incarnation of the following bug:
>>>> https://tracker.ceph.com/issues/22464
>>>> https://github.com/ceph/ceph/pull/24649
>>>>
>>>> The corresponding PR works around it for main device reads (user data
>>>> only!) but theoretically it might still happen
>>>>
>>>> either for DB device or DB data at main device.
>>>>
>>>> Can you observe any bluefs spillovers? Are there any correlation between
>>>> failing OSDs and spillover presence if any, e.g. failing OSDs always
>>>> have a spillover. While OSDs without spillovers never face the issue...
>>>>
>>>> To validate this hypothesis one can try to monitor/check (e.g. once a
>>>> day for a week or something) "bluestore_reads_with_retries" counter over
>>>> OSDs to learn if the issue is happening
>>>>
>>>> in the system.  Non-zero values mean it's there for user data/main
>>>> device and hence is likely to happen for DB ones as well (which doesn't
>>>> have any workaround yet).
>>> OK i checked bluestore_reads_with_retries on 360 osds but all of them say 0.
>>>
>>>
>>>> Additionally you might want to monitor memory usage as the above
>>>> mentioned PR denotes high memory pressure as potential trigger for these
>>>> read errors. So if such pressure happens the hypothesis becomes more valid.
>>> we already do this heavily and have around 10GB of memory per OSD. Also
>>> no of those machines show any io pressure at all.
>>>
>>> All hosts show a constant rate of around 38GB to 45GB mem available in
>>> /proc/meminfo.
>>>
>>> Stefan
>>>
>>>> Thanks,
>>>>
>>>> Igor
>>>>
>>>> PS. Everything above is rather a speculation for now. Available
>>>> information is definitely not enough for extensive troubleshooting the
>>>> cases which happens that rarely.
>>>>
>>>> You might want to start collecting failure-related information
>>>> (including but not limited to failure logs, perf counter dumps, system
>>>> resource reports etc) for future analysis.
>>>>
>>>>
>>>>
>>>> On 1/16/2020 11:58 PM, Stefan Priebe - Profihost AG wrote:
>>>>> Hi Igor,
>>>>>
>>>>> answers inline.
>>>>>
>>>>> Am 16.01.20 um 21:34 schrieb Igor Fedotov:
>>>>>> you may want to run fsck against failing OSDs. Hopefully it will shed
>>>>>> some light.
>>>>> fsck just says everything fine:
>>>>>
>>>>> # ceph-bluestore-tool --command fsck --path /var/lib/ceph/osd/ceph-27/
>>>>> fsck success
>>>>>
>>>>>
>>>>>> Also wondering if OSD is able to recover (startup and proceed working)
>>>>>> after facing the issue?
>>>>> no recover needed. It just runs forever after restarting.
>>>>>
>>>>>> If so do you have any one which failed multiple times? Do you have logs
>>>>>> for these occurrences?
>>>>> may be but there are most probably weeks or month between those failures
>>>>> - most probably logs are already deleted.
>>>>>
>>>>>> Also please note that patch you mentioned doesn't fix previous issues
>>>>>> (i.e. duplicate allocations), it prevents from new ones only.
>>>>>>
>>>>>> But fsck should show them if any...
>>>>> None showed.
>>>>>
>>>>> Stefan
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Igor
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 1/16/2020 10:04 PM, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>> ouch sorry. Here we go:
>>>>>>>
>>>>>>>      -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb:
>>>>>>> submit_transaction error: Corruption: block checksum mismatch code = 2
>>>>>>> Rocksdb transaction:
>>>>>>> Put( Prefix = M key =
>>>>>>> 0x0000000000000402'.OBJ_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 97)
>>>>>>> Put( Prefix = M key =
>>>>>>> 0x0000000000000402'.MAP_00000000000BB85C_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 93)
>>>>>>> Put( Prefix = M key =
>>>>>>> 0x0000000000000916'.0000823257.00000000000073922044' Value size = 196)
>>>>>>> Put( Prefix = M key =
>>>>>>> 0x0000000000000916'.0000823257.00000000000073922045' Value size = 184)
>>>>>>> Put( Prefix = M key = 0x0000000000000916'._info' Value size = 899)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00000000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 418)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00030000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 474)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0007c000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 392)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00090000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 317)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000a0000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 521)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000f4000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 558)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00130000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 649)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00194000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 449)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f001cc000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 580)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00200000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 435)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00240000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 569)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00290000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 465)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f002e0000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 710)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00300000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 599)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0036c000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 372)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003a6000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 130)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003b4000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 540)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003fc000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 47)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff'o'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 1731)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00040000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 675)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00080000'x'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 395)
>>>>>>> Put( Prefix = O key =
>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff'o'
>>>>>>>
>>>>>>>
>>>>>>> Value size = 1328)
>>>>>>> Put( Prefix = X key = 0x0000000018a38deb Value size = 14)
>>>>>>> Put( Prefix = X key = 0x0000000018a38dea Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a035b Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a035c Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a0355 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a0356 Value size = 17)
>>>>>>> Put( Prefix = X key = 0x000000001a54f6e4 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000001b1c061e Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a038f Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a0389 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a0358 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a035f Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a0357 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a0387 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a038a Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a0388 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x00000000134c3fbe Value size = 14)
>>>>>>> Put( Prefix = X key = 0x00000000134c3fb5 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a036e Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a036d Value size = 14)
>>>>>>> Put( Prefix = X key = 0x00000000134c3fb8 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a0371 Value size = 14)
>>>>>>> Put( Prefix = X key = 0x000000000d7a036a Value size = 14)
>>>>>>>        0> 2020-01-16 01:10:13.413759 7f3350a14700 -1
>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
>>>>>>> BlueStore::_kv_sync_thread()' thread 7f3350a14700 time 2020-01-16
>>>>>>> 01:10:13.404113
>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)
>>>>>>>
>>>>>>>    ceph version 12.2.12-11-gd3eae83543
>>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable)
>>>>>>>    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>> const*)+0x102) [0x55c9a712d232]
>>>>>>>    2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55c9a6fb54b5]
>>>>>>>    3: (BlueStore::KVSyncThread::entry()+0xd) [0x55c9a6ff608d]
>>>>>>>    4: (()+0x7494) [0x7f33615f9494]
>>>>>>>    5: (clone()+0x3f) [0x7f3360680acf]
>>>>>>>
>>>>>>> I already picked those:
>>>>>>> https://github.com/ceph/ceph/pull/28644
>>>>>>>
>>>>>>> Greets,
>>>>>>> Stefan
>>>>>>> Am 16.01.20 um 17:00 schrieb Igor Fedotov:
>>>>>>>> Hi Stefan,
>>>>>>>>
>>>>>>>> would you please share log snippet prior the assertions? Looks like
>>>>>>>> RocksDB is failing during transaction submission...
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Igor
>>>>>>>>
>>>>>>>> On 1/16/2020 11:56 AM, Stefan Priebe - Profihost AG wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> does anybody know a fix for this ASSERT / crash?
>>>>>>>>>
>>>>>>>>> 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1
>>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
>>>>>>>>> BlueStore::_kv_sync_thread()' thread 7f8c3f5ab700 time 2020-01-16
>>>>>>>>> 02:02:31.304993
>>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r
>>>>>>>>> == 0)
>>>>>>>>>
>>>>>>>>>     ceph version 12.2.12-11-gd3eae83543
>>>>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable)
>>>>>>>>>     1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>> const*)+0x102) [0x55e6df9d9232]
>>>>>>>>>     2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55e6df8614b5]
>>>>>>>>>     3: (BlueStore::KVSyncThread::entry()+0xd) [0x55e6df8a208d]
>>>>>>>>>     4: (()+0x7494) [0x7f8c50190494]
>>>>>>>>>     5: (clone()+0x3f) [0x7f8c4f217acf]
>>>>>>>>>
>>>>>>>>> all bluestore OSDs are randomly crashing sometimes (once a week).
>>>>>>>>>
>>>>>>>>> Greets,
>>>>>>>>> Stefan
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list
>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux