Hi Igor, answers inline. Am 16.01.20 um 21:34 schrieb Igor Fedotov: > you may want to run fsck against failing OSDs. Hopefully it will shed > some light. fsck just says everything fine: # ceph-bluestore-tool --command fsck --path /var/lib/ceph/osd/ceph-27/ fsck success > Also wondering if OSD is able to recover (startup and proceed working) > after facing the issue? no recover needed. It just runs forever after restarting. > If so do you have any one which failed multiple times? Do you have logs > for these occurrences? may be but there are most probably weeks or month between those failures - most probably logs are already deleted. > Also please note that patch you mentioned doesn't fix previous issues > (i.e. duplicate allocations), it prevents from new ones only. > > But fsck should show them if any... None showed. Stefan > Thanks, > > Igor > > > > On 1/16/2020 10:04 PM, Stefan Priebe - Profihost AG wrote: >> Hi Igor, >> >> ouch sorry. Here we go: >> >> -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb: >> submit_transaction error: Corruption: block checksum mismatch code = 2 >> Rocksdb transaction: >> Put( Prefix = M key = >> 0x0000000000000402'.OBJ_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..' >> >> Value size = 97) >> Put( Prefix = M key = >> 0x0000000000000402'.MAP_00000000000BB85C_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..' >> >> Value size = 93) >> Put( Prefix = M key = >> 0x0000000000000916'.0000823257.00000000000073922044' Value size = 196) >> Put( Prefix = M key = >> 0x0000000000000916'.0000823257.00000000000073922045' Value size = 184) >> Put( Prefix = M key = 0x0000000000000916'._info' Value size = 899) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00000000'x' >> >> Value size = 418) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00030000'x' >> >> Value size = 474) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0007c000'x' >> >> Value size = 392) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00090000'x' >> >> Value size = 317) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000a0000'x' >> >> Value size = 521) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000f4000'x' >> >> Value size = 558) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00130000'x' >> >> Value size = 649) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00194000'x' >> >> Value size = 449) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f001cc000'x' >> >> Value size = 580) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00200000'x' >> >> Value size = 435) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00240000'x' >> >> Value size = 569) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00290000'x' >> >> Value size = 465) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f002e0000'x' >> >> Value size = 710) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00300000'x' >> >> Value size = 599) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0036c000'x' >> >> Value size = 372) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003a6000'x' >> >> Value size = 130) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003b4000'x' >> >> Value size = 540) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003fc000'x' >> >> Value size = 47) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff'o' >> >> Value size = 1731) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00040000'x' >> >> Value size = 675) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00080000'x' >> >> Value size = 395) >> Put( Prefix = O key = >> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff'o' >> >> Value size = 1328) >> Put( Prefix = X key = 0x0000000018a38deb Value size = 14) >> Put( Prefix = X key = 0x0000000018a38dea Value size = 14) >> Put( Prefix = X key = 0x000000000d7a035b Value size = 14) >> Put( Prefix = X key = 0x000000000d7a035c Value size = 14) >> Put( Prefix = X key = 0x000000000d7a0355 Value size = 14) >> Put( Prefix = X key = 0x000000000d7a0356 Value size = 17) >> Put( Prefix = X key = 0x000000001a54f6e4 Value size = 14) >> Put( Prefix = X key = 0x000000001b1c061e Value size = 14) >> Put( Prefix = X key = 0x000000000d7a038f Value size = 14) >> Put( Prefix = X key = 0x000000000d7a0389 Value size = 14) >> Put( Prefix = X key = 0x000000000d7a0358 Value size = 14) >> Put( Prefix = X key = 0x000000000d7a035f Value size = 14) >> Put( Prefix = X key = 0x000000000d7a0357 Value size = 14) >> Put( Prefix = X key = 0x000000000d7a0387 Value size = 14) >> Put( Prefix = X key = 0x000000000d7a038a Value size = 14) >> Put( Prefix = X key = 0x000000000d7a0388 Value size = 14) >> Put( Prefix = X key = 0x00000000134c3fbe Value size = 14) >> Put( Prefix = X key = 0x00000000134c3fb5 Value size = 14) >> Put( Prefix = X key = 0x000000000d7a036e Value size = 14) >> Put( Prefix = X key = 0x000000000d7a036d Value size = 14) >> Put( Prefix = X key = 0x00000000134c3fb8 Value size = 14) >> Put( Prefix = X key = 0x000000000d7a0371 Value size = 14) >> Put( Prefix = X key = 0x000000000d7a036a Value size = 14) >> 0> 2020-01-16 01:10:13.413759 7f3350a14700 -1 >> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >> BlueStore::_kv_sync_thread()' thread 7f3350a14700 time 2020-01-16 >> 01:10:13.404113 >> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0) >> >> ceph version 12.2.12-11-gd3eae83543 >> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x102) [0x55c9a712d232] >> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55c9a6fb54b5] >> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55c9a6ff608d] >> 4: (()+0x7494) [0x7f33615f9494] >> 5: (clone()+0x3f) [0x7f3360680acf] >> >> I already picked those: >> https://github.com/ceph/ceph/pull/28644 >> >> Greets, >> Stefan >> Am 16.01.20 um 17:00 schrieb Igor Fedotov: >>> Hi Stefan, >>> >>> would you please share log snippet prior the assertions? Looks like >>> RocksDB is failing during transaction submission... >>> >>> >>> Thanks, >>> >>> Igor >>> >>> On 1/16/2020 11:56 AM, Stefan Priebe - Profihost AG wrote: >>>> Hello, >>>> >>>> does anybody know a fix for this ASSERT / crash? >>>> >>>> 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1 >>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>> BlueStore::_kv_sync_thread()' thread 7f8c3f5ab700 time 2020-01-16 >>>> 02:02:31.304993 >>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0) >>>> >>>> ceph version 12.2.12-11-gd3eae83543 >>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable) >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> const*)+0x102) [0x55e6df9d9232] >>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55e6df8614b5] >>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55e6df8a208d] >>>> 4: (()+0x7494) [0x7f8c50190494] >>>> 5: (clone()+0x3f) [0x7f8c4f217acf] >>>> >>>> all bluestore OSDs are randomly crashing sometimes (once a week). >>>> >>>> Greets, >>>> Stefan >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com