Am 27.08.19 um 16:20 schrieb Igor Fedotov: > It sounds like OSD is "recovering" after checksum error. May be no idea how this works. systemd is starting the osd again after crashing and than it runs for weeks or days again. > I.e. just failed OSD shows no errors in fsck and is able to restart and > process new write requests for long enough period (longer than just a > couple of minutes). Are these statements true? Yes normally it runs for weeks - not sure if one crashes two times or just once. > If so I can suppose this > is accidental/volatile issue rather than data-at-rest corruption. > Something like data incorrectly read from disk. > > Are you using standalone disk drive for DB/WAL or it's shared with main > one? Standalone disks. > Just in case as a low handing fruit - I'd suggest checking with > dmesg and smartctl for drive errors... no sorry not that easy ;-) and also this would mean nearly 50 to 60 ssds and around 30 servers have suddently hw errors. > FYI: one more reference for the similar issue: > https://tracker.ceph.com/issues/24968 > > Also I recall an issue with some kernels that caused occasional invalid > data reads under high memory pressure/swapping: > https://tracker.ceph.com/issues/22464 We have a current 4.19.X kernel and no memory limit. Mem avail is pretty constant at 32GB. Greets, Stefan > > IMO memory usage worth checking as well... > > > Igor > > > On 8/27/2019 4:52 PM, Stefan Priebe - Profihost AG wrote: >> see inline >> >> Am 27.08.19 um 15:43 schrieb Igor Fedotov: >>> see inline >>> >>> On 8/27/2019 4:41 PM, Stefan Priebe - Profihost AG wrote: >>>> Hi Igor, >>>> >>>> Am 27.08.19 um 14:11 schrieb Igor Fedotov: >>>>> Hi Stefan, >>>>> >>>>> this looks like a duplicate for >>>>> >>>>> https://tracker.ceph.com/issues/37282 >>>>> >>>>> Actually the root cause selection might be quite wide. >>>>> >>>>> From HW issues to broken logic in RocksDB/BlueStore/BlueFS etc. >>>>> >>>>> As far as I understand you have different OSDs which are failing, >>>>> right? >>>> Yes i've seen this on around 50 different OSDs running different HW but >>>> all run ceph 12.2.12. I've not seen this with 12.2.10 which we were >>>> running before. >>>> >>>>> Is the set of these broken OSDs limited somehow? >>>> No at least i'm not able to find >>>> >>>> >>>>> Any specific subset which is failing or something? E.g. just N of them >>>>> are failing from time to time. >>>> No seems totally random. >>>> >>>>> Any similarities for broken OSDs (e.g. specific hardware)? >>>> All run intel xeon CPUs and all run linux ;-) >>>> >>>>> Did you run fsck for any of broken OSDs? Any reports? >>>> Yes but no reports. >>> Are you saying that fsck is fine for OSDs that showed this sort of >>> errors? >> Yes fsck does not show a single error - everything is fine. >> >>>>> Any other errors/crashes in logs before these sort of issues happens? >>>> No >>>> >>>> >>>>> Just in case - what allocator are you using? >>>> tcmalloc >>> I meant BlueStore allocator - is it stupid or bitmap? >> ah the default one i think this is stupid. >> >> Greets, >> Stefan >> >>>> Greets, >>>> Stefan >>>> >>>>> Thanks, >>>>> >>>>> Igor >>>>> >>>>> >>>>> >>>>> On 8/27/2019 1:03 PM, Stefan Priebe - Profihost AG wrote: >>>>>> Hello, >>>>>> >>>>>> since some month all our bluestore OSDs keep crashing from time to >>>>>> time. >>>>>> Currently about 5 OSDs per day. >>>>>> >>>>>> All of them show the following trace: >>>>>> Trace: >>>>>> 2019-07-24 08:36:48.995397 7fb19a711700 -1 rocksdb: >>>>>> submit_transaction >>>>>> error: Corruption: block checksum mismatch code = 2 Rocksdb >>>>>> transaction: >>>>>> Put( Prefix = M key = >>>>>> 0x00000000000009a5'.0000916366.00000000000074680351' Value size = >>>>>> 184) >>>>>> Put( Prefix = M key = 0x00000000000009a5'._fastinfo' Value size = >>>>>> 186) >>>>>> Put( Prefix = O key = >>>>>> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff6f00240000'x' >>>>>> >>>>>> >>>>>> >>>>>> Value size = 530) >>>>>> Put( Prefix = O key = >>>>>> 0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff'o' >>>>>> >>>>>> >>>>>> >>>>>> Value size = 510) >>>>>> Put( Prefix = L key = 0x0000000010ba60f1 Value size = 4135) >>>>>> 2019-07-24 08:36:49.012110 7fb19a711700 -1 >>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>>>> BlueStore::_kv_sync_thread()' thread 7fb19a711700 time 2019-07-24 >>>>>> 08:36:48.995415 >>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r >>>>>> == 0) >>>>>> >>>>>> ceph version 12.2.12-7-g1321c5e91f >>>>>> (1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable) >>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>> const*)+0x102) [0x5653a010e222] >>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x56539ff964b5] >>>>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x56539ffd708d] >>>>>> 4: (()+0x7494) [0x7fb1ab2f6494] >>>>>> 5: (clone()+0x3f) [0x7fb1aa37dacf] >>>>>> >>>>>> I already opend up a tracker: >>>>>> https://tracker.ceph.com/issues/41367 >>>>>> >>>>>> Can anybody help? Is this known? >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com