Hello Igor, i can now confirm that this is indeed a kernel bug. The issue does no longer happen on upgraded nodes. Do you know more about it? I really would like to know in which version it was fixed to prevent rebooting all ceph nodes. Greets, Stefan Am 27.08.19 um 16:20 schrieb Igor Fedotov: > It sounds like OSD is "recovering" after checksum error. > > I.e. just failed OSD shows no errors in fsck and is able to restart and > process new write requests for long enough period (longer than just a > couple of minutes). Are these statements true? If so I can suppose this > is accidental/volatile issue rather than data-at-rest corruption. > Something like data incorrectly read from disk. > > Are you using standalone disk drive for DB/WAL or it's shared with main > one? Just in case as a low handing fruit - I'd suggest checking with > dmesg and smartctl for drive errors... > > FYI: one more reference for the similar issue: > https://tracker.ceph.com/issues/24968 > > HW issue this time... > > > Also I recall an issue with some kernels that caused occasional invalid > data reads under high memory pressure/swapping: > https://tracker.ceph.com/issues/22464 > > IMO memory usage worth checking as well... > > > Igor > > > On 8/27/2019 4:52 PM, Stefan Priebe - Profihost AG wrote: >> see inline >> >> Am 27.08.19 um 15:43 schrieb Igor Fedotov: >>> see inline >>> >>> On 8/27/2019 4:41 PM, Stefan Priebe - Profihost AG wrote: >>>> Hi Igor, >>>> >>>> Am 27.08.19 um 14:11 schrieb Igor Fedotov: >>>>> Hi Stefan, >>>>> >>>>> this looks like a duplicate for >>>>> >>>>> https://tracker.ceph.com/issues/37282 >>>>> >>>>> Actually the root cause selection might be quite wide. >>>>> >>>>> From HW issues to broken logic in RocksDB/BlueStore/BlueFS etc. >>>>> >>>>> As far as I understand you have different OSDs which are failing, >>>>> right? >>>> Yes i've seen this on around 50 different OSDs running different HW but >>>> all run ceph 12.2.12. I've not seen this with 12.2.10 which we were >>>> running before. >>>> >>>>> Is the set of these broken OSDs limited somehow? >>>> No at least i'm not able to find >>>> >>>> >>>>> Any specific subset which is failing or something? E.g. just N of them >>>>> are failing from time to time. >>>> No seems totally random. >>>> >>>>> Any similarities for broken OSDs (e.g. specific hardware)? >>>> All run intel xeon CPUs and all run linux ;-) >>>> >>>>> Did you run fsck for any of broken OSDs? Any reports? >>>> Yes but no reports. >>> Are you saying that fsck is fine for OSDs that showed this sort of >>> errors? >> Yes fsck does not show a single error - everything is fine. >> >>>>> Any other errors/crashes in logs before these sort of issues happens? >>>> No >>>> >>>> >>>>> Just in case - what allocator are you using? >>>> tcmalloc >>> I meant BlueStore allocator - is it stupid or bitmap? >> ah the default one i think this is stupid. >> >> Greets, >> Stefan >> >>>> Greets, >>>> Stefan >>>> >>>>> Thanks, >>>>> >>>>> Igor >>>>> >>>>> >>>>> >>>>> On 8/27/2019 1:03 PM, Stefan Priebe - Profihost AG wrote: >>>>>> Hello, >>>>>> >>>>>> since some month all our bluestore OSDs keep crashing from time to >>>>>> time. >>>>>> Currently about 5 OSDs per day. >>>>>> >>>>>> All of them show the following trace: >>>>>> Trace: >>>>>> 2019-07-24 08:36:48.995397 7fb19a711700 -1 rocksdb: >>>>>> submit_transaction >>>>>> error: Corruption: block checksum mismatch code = 2 Rocksdb >>>>>> transaction: >>>>>> Put( Prefix = M key = >>>>>> 0x00000000000009a5'.0000916366.00000000000074680351' Value size = >>>>>> 184) >>>>>> Put( Prefix = M key = 0x00000000000009a5'._fastinfo' Value size = >>>>>> 186) >>>>>> Put( Prefix = O key = >>>>>> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff6f00240000'x' >>>>>> >>>>>> >>>>>> >>>>>> Value size = 530) >>>>>> Put( Prefix = O key = >>>>>> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff'o' >>>>>> >>>>>> >>>>>> >>>>>> Value size = 510) >>>>>> Put( Prefix = L key = 0x0000000010ba60f1 Value size = 4135) >>>>>> 2019-07-24 08:36:49.012110 7fb19a711700 -1 >>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>>>> BlueStore::_kv_sync_thread()' thread 7fb19a711700 time 2019-07-24 >>>>>> 08:36:48.995415 >>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r >>>>>> == 0) >>>>>> >>>>>> ceph version 12.2.12-7-g1321c5e91f >>>>>> (1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable) >>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>> const*)+0x102) [0x5653a010e222] >>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x56539ff964b5] >>>>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x56539ffd708d] >>>>>> 4: (()+0x7494) [0x7fb1ab2f6494] >>>>>> 5: (clone()+0x3f) [0x7fb1aa37dacf] >>>>>> >>>>>> I already opend up a tracker: >>>>>> https://tracker.ceph.com/issues/41367 >>>>>> >>>>>> Can anybody help? Is this known? >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com