Re: Bluestore OSDs keep crashing in BlueStore.cc: 8808: FAILED assert(r == 0)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



see inline

On 8/27/2019 4:41 PM, Stefan Priebe - Profihost AG wrote:
Hi Igor,

Am 27.08.19 um 14:11 schrieb Igor Fedotov:
Hi Stefan,

this looks like a duplicate for

https://tracker.ceph.com/issues/37282

Actually the root cause selection might be quite wide.

 From HW issues to broken logic in RocksDB/BlueStore/BlueFS etc.

As far as I understand you have different OSDs which are failing, right?
Yes i've seen this on around 50 different OSDs running different HW but
all run ceph 12.2.12. I've not seen this with 12.2.10 which we were
running before.

Is the set of these broken OSDs limited somehow?
No at least i'm not able to find


Any specific subset which is failing or something? E.g. just N of them
are failing from time to time.
No seems totally random.

Any similarities for broken OSDs (e.g. specific hardware)?
All run intel xeon CPUs and all run linux ;-)

Did you run fsck for any of broken OSDs? Any reports?
Yes but no reports.
Are you saying that fsck is fine for OSDs that showed this sort of errors?


Any other errors/crashes in logs before these sort of issues happens?
No


Just in case - what allocator are you using?
tcmalloc
I meant BlueStore allocator - is it stupid or bitmap?

Greets,
Stefan

Thanks,

Igor



On 8/27/2019 1:03 PM, Stefan Priebe - Profihost AG wrote:
Hello,

since some month all our bluestore OSDs keep crashing from time to time.
Currently about 5 OSDs per day.

All of them show the following trace:
Trace:
2019-07-24 08:36:48.995397 7fb19a711700 -1 rocksdb: submit_transaction
error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
Put( Prefix = M key =
0x00000000000009a5'.0000916366.00000000000074680351' Value size = 184)
Put( Prefix = M key = 0x00000000000009a5'._fastinfo' Value size = 186)
Put( Prefix = O key =
0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff6f00240000'x'

Value size = 530)
Put( Prefix = O key =
0x7f8000000000000003bb605f'd!rbd_data.afe49a6b8b4567.0000000000003c11!='0xfffffffffffffffeffffffffffffffff'o'

Value size = 510)
Put( Prefix = L key = 0x0000000010ba60f1 Value size = 4135)
2019-07-24 08:36:49.012110 7fb19a711700 -1
/build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
BlueStore::_kv_sync_thread()' thread 7fb19a711700 time 2019-07-24
08:36:48.995415
/build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)

ceph version 12.2.12-7-g1321c5e91f
(1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable)
   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x5653a010e222]
   2: (BlueStore::_kv_sync_thread()+0x24c5) [0x56539ff964b5]
   3: (BlueStore::KVSyncThread::entry()+0xd) [0x56539ffd708d]
   4: (()+0x7494) [0x7fb1ab2f6494]
   5: (clone()+0x3f) [0x7fb1aa37dacf]

I already opend up a tracker:
https://tracker.ceph.com/issues/41367

Can anybody help? Is this known?

Greets,
Stefan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux