Yes, except that this happens on 8 different clusters with different hw but same ceph version and same kernel version. Greets, Stefan > Am 19.01.2020 um 11:53 schrieb Igor Fedotov <ifedotov@xxxxxxx>: > > So the intermediate summary is: > > Any OSD in the cluster can experience interim RocksDB checksum failure. Which isn't present after OSD restart. > > No HW issues observed, no persistent artifacts (except OSD log) afterwards. > > And looks like the issue is rather specific to the cluster as no similar reports from other users seem to be present. > > > Sorry, I'm out of ideas other then collect all the failure logs and try to find something common in them. May be this will shed some light.. > > BTW from my experience it might make sense to inspect OSD log prior to failure (any error messages and/or prior restarts, etc) sometimes this might provide some hints. > > > Thanks, > > Igor > > >> On 1/17/2020 2:30 PM, Stefan Priebe - Profihost AG wrote: >> HI Igor, >> >>> Am 17.01.20 um 12:10 schrieb Igor Fedotov: >>> hmmm.. >>> >>> Just in case - suggest to check H/W errors with dmesg. >> this happens on around 80 nodes - i don't expect all of those have not >> identified hw errors. Also all of them are monitored - no dmesg outpout >> contains any errors. >> >>> Also there are some (not very much though) chances this is another >>> incarnation of the following bug: >>> https://tracker.ceph.com/issues/22464 >>> https://github.com/ceph/ceph/pull/24649 >>> >>> The corresponding PR works around it for main device reads (user data >>> only!) but theoretically it might still happen >>> >>> either for DB device or DB data at main device. >>> >>> Can you observe any bluefs spillovers? Are there any correlation between >>> failing OSDs and spillover presence if any, e.g. failing OSDs always >>> have a spillover. While OSDs without spillovers never face the issue... >>> >>> To validate this hypothesis one can try to monitor/check (e.g. once a >>> day for a week or something) "bluestore_reads_with_retries" counter over >>> OSDs to learn if the issue is happening >>> >>> in the system. Non-zero values mean it's there for user data/main >>> device and hence is likely to happen for DB ones as well (which doesn't >>> have any workaround yet). >> OK i checked bluestore_reads_with_retries on 360 osds but all of them say 0. >> >> >>> Additionally you might want to monitor memory usage as the above >>> mentioned PR denotes high memory pressure as potential trigger for these >>> read errors. So if such pressure happens the hypothesis becomes more valid. >> we already do this heavily and have around 10GB of memory per OSD. Also >> no of those machines show any io pressure at all. >> >> All hosts show a constant rate of around 38GB to 45GB mem available in >> /proc/meminfo. >> >> Stefan >> >>> Thanks, >>> >>> Igor >>> >>> PS. Everything above is rather a speculation for now. Available >>> information is definitely not enough for extensive troubleshooting the >>> cases which happens that rarely. >>> >>> You might want to start collecting failure-related information >>> (including but not limited to failure logs, perf counter dumps, system >>> resource reports etc) for future analysis. >>> >>> >>> >>> On 1/16/2020 11:58 PM, Stefan Priebe - Profihost AG wrote: >>>> Hi Igor, >>>> >>>> answers inline. >>>> >>>> Am 16.01.20 um 21:34 schrieb Igor Fedotov: >>>>> you may want to run fsck against failing OSDs. Hopefully it will shed >>>>> some light. >>>> fsck just says everything fine: >>>> >>>> # ceph-bluestore-tool --command fsck --path /var/lib/ceph/osd/ceph-27/ >>>> fsck success >>>> >>>> >>>>> Also wondering if OSD is able to recover (startup and proceed working) >>>>> after facing the issue? >>>> no recover needed. It just runs forever after restarting. >>>> >>>>> If so do you have any one which failed multiple times? Do you have logs >>>>> for these occurrences? >>>> may be but there are most probably weeks or month between those failures >>>> - most probably logs are already deleted. >>>> >>>>> Also please note that patch you mentioned doesn't fix previous issues >>>>> (i.e. duplicate allocations), it prevents from new ones only. >>>>> >>>>> But fsck should show them if any... >>>> None showed. >>>> >>>> Stefan >>>> >>>>> Thanks, >>>>> >>>>> Igor >>>>> >>>>> >>>>> >>>>> On 1/16/2020 10:04 PM, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Igor, >>>>>> >>>>>> ouch sorry. Here we go: >>>>>> >>>>>> -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb: >>>>>> submit_transaction error: Corruption: block checksum mismatch code = 2 >>>>>> Rocksdb transaction: >>>>>> Put( Prefix = M key = >>>>>> 0x0000000000000402'.OBJ_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..' >>>>>> >>>>>> >>>>>> Value size = 97) >>>>>> Put( Prefix = M key = >>>>>> 0x0000000000000402'.MAP_00000000000BB85C_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..' >>>>>> >>>>>> >>>>>> Value size = 93) >>>>>> Put( Prefix = M key = >>>>>> 0x0000000000000916'.0000823257.00000000000073922044' Value size = 196) >>>>>> Put( Prefix = M key = >>>>>> 0x0000000000000916'.0000823257.00000000000073922045' Value size = 184) >>>>>> Put( Prefix = M key = 0x0000000000000916'._info' Value size = 899) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00000000'x' >>>>>> >>>>>> >>>>>> Value size = 418) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00030000'x' >>>>>> >>>>>> >>>>>> Value size = 474) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0007c000'x' >>>>>> >>>>>> >>>>>> Value size = 392) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00090000'x' >>>>>> >>>>>> >>>>>> Value size = 317) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000a0000'x' >>>>>> >>>>>> >>>>>> Value size = 521) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000f4000'x' >>>>>> >>>>>> >>>>>> Value size = 558) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00130000'x' >>>>>> >>>>>> >>>>>> Value size = 649) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00194000'x' >>>>>> >>>>>> >>>>>> Value size = 449) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f001cc000'x' >>>>>> >>>>>> >>>>>> Value size = 580) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00200000'x' >>>>>> >>>>>> >>>>>> Value size = 435) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00240000'x' >>>>>> >>>>>> >>>>>> Value size = 569) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00290000'x' >>>>>> >>>>>> >>>>>> Value size = 465) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f002e0000'x' >>>>>> >>>>>> >>>>>> Value size = 710) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00300000'x' >>>>>> >>>>>> >>>>>> Value size = 599) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0036c000'x' >>>>>> >>>>>> >>>>>> Value size = 372) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003a6000'x' >>>>>> >>>>>> >>>>>> Value size = 130) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003b4000'x' >>>>>> >>>>>> >>>>>> Value size = 540) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003fc000'x' >>>>>> >>>>>> >>>>>> Value size = 47) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff'o' >>>>>> >>>>>> >>>>>> Value size = 1731) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00040000'x' >>>>>> >>>>>> >>>>>> Value size = 675) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00080000'x' >>>>>> >>>>>> >>>>>> Value size = 395) >>>>>> Put( Prefix = O key = >>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff'o' >>>>>> >>>>>> >>>>>> Value size = 1328) >>>>>> Put( Prefix = X key = 0x0000000018a38deb Value size = 14) >>>>>> Put( Prefix = X key = 0x0000000018a38dea Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a035b Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a035c Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a0355 Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a0356 Value size = 17) >>>>>> Put( Prefix = X key = 0x000000001a54f6e4 Value size = 14) >>>>>> Put( Prefix = X key = 0x000000001b1c061e Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a038f Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a0389 Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a0358 Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a035f Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a0357 Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a0387 Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a038a Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a0388 Value size = 14) >>>>>> Put( Prefix = X key = 0x00000000134c3fbe Value size = 14) >>>>>> Put( Prefix = X key = 0x00000000134c3fb5 Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a036e Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a036d Value size = 14) >>>>>> Put( Prefix = X key = 0x00000000134c3fb8 Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a0371 Value size = 14) >>>>>> Put( Prefix = X key = 0x000000000d7a036a Value size = 14) >>>>>> 0> 2020-01-16 01:10:13.413759 7f3350a14700 -1 >>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>>>> BlueStore::_kv_sync_thread()' thread 7f3350a14700 time 2020-01-16 >>>>>> 01:10:13.404113 >>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0) >>>>>> >>>>>> ceph version 12.2.12-11-gd3eae83543 >>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable) >>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>> const*)+0x102) [0x55c9a712d232] >>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55c9a6fb54b5] >>>>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55c9a6ff608d] >>>>>> 4: (()+0x7494) [0x7f33615f9494] >>>>>> 5: (clone()+0x3f) [0x7f3360680acf] >>>>>> >>>>>> I already picked those: >>>>>> https://github.com/ceph/ceph/pull/28644 >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> Am 16.01.20 um 17:00 schrieb Igor Fedotov: >>>>>>> Hi Stefan, >>>>>>> >>>>>>> would you please share log snippet prior the assertions? Looks like >>>>>>> RocksDB is failing during transaction submission... >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Igor >>>>>>> >>>>>>> On 1/16/2020 11:56 AM, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> does anybody know a fix for this ASSERT / crash? >>>>>>>> >>>>>>>> 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1 >>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>>>>>> BlueStore::_kv_sync_thread()' thread 7f8c3f5ab700 time 2020-01-16 >>>>>>>> 02:02:31.304993 >>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r >>>>>>>> == 0) >>>>>>>> >>>>>>>> ceph version 12.2.12-11-gd3eae83543 >>>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable) >>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>> const*)+0x102) [0x55e6df9d9232] >>>>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55e6df8614b5] >>>>>>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55e6df8a208d] >>>>>>>> 4: (()+0x7494) [0x7f8c50190494] >>>>>>>> 5: (clone()+0x3f) [0x7f8c4f217acf] >>>>>>>> >>>>>>>> all bluestore OSDs are randomly crashing sometimes (once a week). >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com