Hi Igor, Am 27.08.19 um 14:11 schrieb Igor Fedotov: > Hi Stefan, > > this looks like a duplicate for > > https://tracker.ceph.com/issues/37282 > > Actually the root cause selection might be quite wide. > > From HW issues to broken logic in RocksDB/BlueStore/BlueFS etc. > > As far as I understand you have different OSDs which are failing, right? Yes i've seen this on around 50 different OSDs running different HW but all run ceph 12.2.12. I've not seen this with 12.2.10 which we were running before. > Is the set of these broken OSDs limited somehow? No at least i'm not able to find > Any specific subset which is failing or something? E.g. just N of them > are failing from time to time. No seems totally random. > Any similarities for broken OSDs (e.g. specific hardware)? All run intel xeon CPUs and all run linux ;-) > Did you run fsck for any of broken OSDs? Any reports? Yes but no reports. > Any other errors/crashes in logs before these sort of issues happens? No > Just in case - what allocator are you using? tcmalloc Greets, Stefan > > Thanks, > > Igor > > > > On 8/27/2019 1:03 PM, Stefan Priebe - Profihost AG wrote: >> Hello, >> >> since some month all our bluestore OSDs keep crashing from time to time. >> Currently about 5 OSDs per day. >> >> All of them show the following trace: >> Trace: >> 2019-07-24 08:36:48.995397 7fb19a711700 -1 rocksdb: submit_transaction >> error: Corruption: block checksum mismatch code = 2 Rocksdb transaction: >> Put( Prefix = M key = >> 0x00000000000009a5'.0000916366.00000000000074680351' Value size = 184) >> Put( Prefix = M key = 0x00000000000009a5'._fastinfo' Value size = 186) >> Put( Prefix = O key = >> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff6f00240000'x' >> >> Value size = 530) >> Put( Prefix = O key = >> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff'o' >> >> Value size = 510) >> Put( Prefix = L key = 0x0000000010ba60f1 Value size = 4135) >> 2019-07-24 08:36:49.012110 7fb19a711700 -1 >> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >> BlueStore::_kv_sync_thread()' thread 7fb19a711700 time 2019-07-24 >> 08:36:48.995415 >> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0) >> >> ceph version 12.2.12-7-g1321c5e91f >> (1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x102) [0x5653a010e222] >> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x56539ff964b5] >> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x56539ffd708d] >> 4: (()+0x7494) [0x7fb1ab2f6494] >> 5: (clone()+0x3f) [0x7fb1aa37dacf] >> >> I already opend up a tracker: >> https://tracker.ceph.com/issues/41367 >> >> Can anybody help? Is this known? >> >> Greets, >> Stefan >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com