Hello Igor, there's absolutely nothing in the logs before. What do those lines mean: Put( Prefix = O key = 0x7f8000000000000001cc45c881217262'd_data.4303206b8b4567.0000000000009632!='0xfffffffffffffffeffffffffffffffff6f00120000'x' Value size = 480) Put( Prefix = O key = 0x7f8000000000000001cc45c881217262'd_data.4303206b8b4567.0000000000009632!='0xfffffffffffffffeffffffffffffffff'o' Value size = 510) on the right size i always see 0xfffffffffffffffeffffffffffffffff on all failed OSDs. greets, Stefan Am 19.01.20 um 14:07 schrieb Stefan Priebe - Profihost AG: > Yes, except that this happens on 8 different clusters with different hw but same ceph version and same kernel version. > > Greets, > Stefan > >> Am 19.01.2020 um 11:53 schrieb Igor Fedotov <ifedotov@xxxxxxx>: >> >> So the intermediate summary is: >> >> Any OSD in the cluster can experience interim RocksDB checksum failure. Which isn't present after OSD restart. >> >> No HW issues observed, no persistent artifacts (except OSD log) afterwards. >> >> And looks like the issue is rather specific to the cluster as no similar reports from other users seem to be present. >> >> >> Sorry, I'm out of ideas other then collect all the failure logs and try to find something common in them. May be this will shed some light.. >> >> BTW from my experience it might make sense to inspect OSD log prior to failure (any error messages and/or prior restarts, etc) sometimes this might provide some hints. >> >> >> Thanks, >> >> Igor >> >> >>> On 1/17/2020 2:30 PM, Stefan Priebe - Profihost AG wrote: >>> HI Igor, >>> >>>> Am 17.01.20 um 12:10 schrieb Igor Fedotov: >>>> hmmm.. >>>> >>>> Just in case - suggest to check H/W errors with dmesg. >>> this happens on around 80 nodes - i don't expect all of those have not >>> identified hw errors. Also all of them are monitored - no dmesg outpout >>> contains any errors. >>> >>>> Also there are some (not very much though) chances this is another >>>> incarnation of the following bug: >>>> https://tracker.ceph.com/issues/22464 >>>> https://github.com/ceph/ceph/pull/24649 >>>> >>>> The corresponding PR works around it for main device reads (user data >>>> only!) but theoretically it might still happen >>>> >>>> either for DB device or DB data at main device. >>>> >>>> Can you observe any bluefs spillovers? Are there any correlation between >>>> failing OSDs and spillover presence if any, e.g. failing OSDs always >>>> have a spillover. While OSDs without spillovers never face the issue... >>>> >>>> To validate this hypothesis one can try to monitor/check (e.g. once a >>>> day for a week or something) "bluestore_reads_with_retries" counter over >>>> OSDs to learn if the issue is happening >>>> >>>> in the system. Non-zero values mean it's there for user data/main >>>> device and hence is likely to happen for DB ones as well (which doesn't >>>> have any workaround yet). >>> OK i checked bluestore_reads_with_retries on 360 osds but all of them say 0. >>> >>> >>>> Additionally you might want to monitor memory usage as the above >>>> mentioned PR denotes high memory pressure as potential trigger for these >>>> read errors. So if such pressure happens the hypothesis becomes more valid. >>> we already do this heavily and have around 10GB of memory per OSD. Also >>> no of those machines show any io pressure at all. >>> >>> All hosts show a constant rate of around 38GB to 45GB mem available in >>> /proc/meminfo. >>> >>> Stefan >>> >>>> Thanks, >>>> >>>> Igor >>>> >>>> PS. Everything above is rather a speculation for now. Available >>>> information is definitely not enough for extensive troubleshooting the >>>> cases which happens that rarely. >>>> >>>> You might want to start collecting failure-related information >>>> (including but not limited to failure logs, perf counter dumps, system >>>> resource reports etc) for future analysis. >>>> >>>> >>>> >>>> On 1/16/2020 11:58 PM, Stefan Priebe - Profihost AG wrote: >>>>> Hi Igor, >>>>> >>>>> answers inline. >>>>> >>>>> Am 16.01.20 um 21:34 schrieb Igor Fedotov: >>>>>> you may want to run fsck against failing OSDs. Hopefully it will shed >>>>>> some light. >>>>> fsck just says everything fine: >>>>> >>>>> # ceph-bluestore-tool --command fsck --path /var/lib/ceph/osd/ceph-27/ >>>>> fsck success >>>>> >>>>> >>>>>> Also wondering if OSD is able to recover (startup and proceed working) >>>>>> after facing the issue? >>>>> no recover needed. It just runs forever after restarting. >>>>> >>>>>> If so do you have any one which failed multiple times? Do you have logs >>>>>> for these occurrences? >>>>> may be but there are most probably weeks or month between those failures >>>>> - most probably logs are already deleted. >>>>> >>>>>> Also please note that patch you mentioned doesn't fix previous issues >>>>>> (i.e. duplicate allocations), it prevents from new ones only. >>>>>> >>>>>> But fsck should show them if any... >>>>> None showed. >>>>> >>>>> Stefan >>>>> >>>>>> Thanks, >>>>>> >>>>>> Igor >>>>>> >>>>>> >>>>>> >>>>>> On 1/16/2020 10:04 PM, Stefan Priebe - Profihost AG wrote: >>>>>>> Hi Igor, >>>>>>> >>>>>>> ouch sorry. Here we go: >>>>>>> >>>>>>> -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb: >>>>>>> submit_transaction error: Corruption: block checksum mismatch code = 2 >>>>>>> Rocksdb transaction: >>>>>>> Put( Prefix = M key = >>>>>>> 0x0000000000000402'.OBJ_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..' >>>>>>> >>>>>>> >>>>>>> Value size = 97) >>>>>>> Put( Prefix = M key = >>>>>>> 0x0000000000000402'.MAP_00000000000BB85C_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..' >>>>>>> >>>>>>> >>>>>>> Value size = 93) >>>>>>> Put( Prefix = M key = >>>>>>> 0x0000000000000916'.0000823257.00000000000073922044' Value size = 196) >>>>>>> Put( Prefix = M key = >>>>>>> 0x0000000000000916'.0000823257.00000000000073922045' Value size = 184) >>>>>>> Put( Prefix = M key = 0x0000000000000916'._info' Value size = 899) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00000000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 418) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00030000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 474) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0007c000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 392) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00090000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 317) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000a0000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 521) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000f4000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 558) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00130000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 649) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00194000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 449) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f001cc000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 580) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00200000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 435) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00240000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 569) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00290000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 465) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f002e0000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 710) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00300000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 599) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0036c000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 372) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003a6000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 130) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003b4000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 540) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003fc000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 47) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff'o' >>>>>>> >>>>>>> >>>>>>> Value size = 1731) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00040000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 675) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00080000'x' >>>>>>> >>>>>>> >>>>>>> Value size = 395) >>>>>>> Put( Prefix = O key = >>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff'o' >>>>>>> >>>>>>> >>>>>>> Value size = 1328) >>>>>>> Put( Prefix = X key = 0x0000000018a38deb Value size = 14) >>>>>>> Put( Prefix = X key = 0x0000000018a38dea Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a035b Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a035c Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a0355 Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a0356 Value size = 17) >>>>>>> Put( Prefix = X key = 0x000000001a54f6e4 Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000001b1c061e Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a038f Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a0389 Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a0358 Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a035f Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a0357 Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a0387 Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a038a Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a0388 Value size = 14) >>>>>>> Put( Prefix = X key = 0x00000000134c3fbe Value size = 14) >>>>>>> Put( Prefix = X key = 0x00000000134c3fb5 Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a036e Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a036d Value size = 14) >>>>>>> Put( Prefix = X key = 0x00000000134c3fb8 Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a0371 Value size = 14) >>>>>>> Put( Prefix = X key = 0x000000000d7a036a Value size = 14) >>>>>>> 0> 2020-01-16 01:10:13.413759 7f3350a14700 -1 >>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>>>>> BlueStore::_kv_sync_thread()' thread 7f3350a14700 time 2020-01-16 >>>>>>> 01:10:13.404113 >>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0) >>>>>>> >>>>>>> ceph version 12.2.12-11-gd3eae83543 >>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable) >>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>> const*)+0x102) [0x55c9a712d232] >>>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55c9a6fb54b5] >>>>>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55c9a6ff608d] >>>>>>> 4: (()+0x7494) [0x7f33615f9494] >>>>>>> 5: (clone()+0x3f) [0x7f3360680acf] >>>>>>> >>>>>>> I already picked those: >>>>>>> https://github.com/ceph/ceph/pull/28644 >>>>>>> >>>>>>> Greets, >>>>>>> Stefan >>>>>>> Am 16.01.20 um 17:00 schrieb Igor Fedotov: >>>>>>>> Hi Stefan, >>>>>>>> >>>>>>>> would you please share log snippet prior the assertions? Looks like >>>>>>>> RocksDB is failing during transaction submission... >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Igor >>>>>>>> >>>>>>>> On 1/16/2020 11:56 AM, Stefan Priebe - Profihost AG wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> does anybody know a fix for this ASSERT / crash? >>>>>>>>> >>>>>>>>> 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1 >>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>>>>>>> BlueStore::_kv_sync_thread()' thread 7f8c3f5ab700 time 2020-01-16 >>>>>>>>> 02:02:31.304993 >>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r >>>>>>>>> == 0) >>>>>>>>> >>>>>>>>> ceph version 12.2.12-11-gd3eae83543 >>>>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable) >>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>> const*)+0x102) [0x55e6df9d9232] >>>>>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55e6df8614b5] >>>>>>>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55e6df8a208d] >>>>>>>>> 4: (()+0x7494) [0x7f8c50190494] >>>>>>>>> 5: (clone()+0x3f) [0x7f8c4f217acf] >>>>>>>>> >>>>>>>>> all bluestore OSDs are randomly crashing sometimes (once a week). >>>>>>>>> >>>>>>>>> Greets, >>>>>>>>> Stefan >>>>>>>>> _______________________________________________ >>>>>>>>> ceph-users mailing list >>>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com