see inline Am 27.08.19 um 15:43 schrieb Igor Fedotov: > see inline > > On 8/27/2019 4:41 PM, Stefan Priebe - Profihost AG wrote: >> Hi Igor, >> >> Am 27.08.19 um 14:11 schrieb Igor Fedotov: >>> Hi Stefan, >>> >>> this looks like a duplicate for >>> >>> https://tracker.ceph.com/issues/37282 >>> >>> Actually the root cause selection might be quite wide. >>> >>> From HW issues to broken logic in RocksDB/BlueStore/BlueFS etc. >>> >>> As far as I understand you have different OSDs which are failing, right? >> Yes i've seen this on around 50 different OSDs running different HW but >> all run ceph 12.2.12. I've not seen this with 12.2.10 which we were >> running before. >> >>> Is the set of these broken OSDs limited somehow? >> No at least i'm not able to find >> >> >>> Any specific subset which is failing or something? E.g. just N of them >>> are failing from time to time. >> No seems totally random. >> >>> Any similarities for broken OSDs (e.g. specific hardware)? >> All run intel xeon CPUs and all run linux ;-) >> >>> Did you run fsck for any of broken OSDs? Any reports? >> Yes but no reports. > Are you saying that fsck is fine for OSDs that showed this sort of errors? Yes fsck does not show a single error - everything is fine. >>> Any other errors/crashes in logs before these sort of issues happens? >> No >> >> >>> Just in case - what allocator are you using? >> tcmalloc > I meant BlueStore allocator - is it stupid or bitmap? ah the default one i think this is stupid. Greets, Stefan >> >> Greets, >> Stefan >> >>> Thanks, >>> >>> Igor >>> >>> >>> >>> On 8/27/2019 1:03 PM, Stefan Priebe - Profihost AG wrote: >>>> Hello, >>>> >>>> since some month all our bluestore OSDs keep crashing from time to >>>> time. >>>> Currently about 5 OSDs per day. >>>> >>>> All of them show the following trace: >>>> Trace: >>>> 2019-07-24 08:36:48.995397 7fb19a711700 -1 rocksdb: submit_transaction >>>> error: Corruption: block checksum mismatch code = 2 Rocksdb >>>> transaction: >>>> Put( Prefix = M key = >>>> 0x00000000000009a5'.0000916366.00000000000074680351' Value size = 184) >>>> Put( Prefix = M key = 0x00000000000009a5'._fastinfo' Value size = 186) >>>> Put( Prefix = O key = >>>> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff6f00240000'x' >>>> >>>> >>>> Value size = 530) >>>> Put( Prefix = O key = >>>> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff'o' >>>> >>>> >>>> Value size = 510) >>>> Put( Prefix = L key = 0x0000000010ba60f1 Value size = 4135) >>>> 2019-07-24 08:36:49.012110 7fb19a711700 -1 >>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>> BlueStore::_kv_sync_thread()' thread 7fb19a711700 time 2019-07-24 >>>> 08:36:48.995415 >>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0) >>>> >>>> ceph version 12.2.12-7-g1321c5e91f >>>> (1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable) >>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> const*)+0x102) [0x5653a010e222] >>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x56539ff964b5] >>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x56539ffd708d] >>>> 4: (()+0x7494) [0x7fb1ab2f6494] >>>> 5: (clone()+0x3f) [0x7fb1aa37dacf] >>>> >>>> I already opend up a tracker: >>>> https://tracker.ceph.com/issues/41367 >>>> >>>> Can anybody help? Is this known? >>>> >>>> Greets, >>>> Stefan >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com