Mimic/13.2.5 bluestore OSDs crashing during startup in OSDMap::decode

Erik Lindahl <erik.lindahl@xxxxxxxxx> · Fri, 26 Apr 2019 20:40:55 +0200

Hi list,
In conjunction with taking a new storage server online we observed that a whole bunch of the SSD OSDs we use for metadata went offline, and crash every time they try to restart with an abort signal in OSDMap::decode - brief log below:

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3362] Recovered from manifest file:db/MANIFEST-001493 succeeded,manifest_file_number is 1493, next_file_number is 1496, last_sequence is 45904669, log_number is 0,prev_log_number is 0,max_column_family is 0,deleted_log_number is 1491

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3370] Column family [default] (ID 0), log number is 1492

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556294168125624, "job": 1, "event": "recovery_started", "log_files": [1494]}
2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:551] Recovering log #1494 mode 0
2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:2863] Creating manifest 1496

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556294168126875, "job": 1, "event": "recovery_finished"}
2019-04-26 17:56:08.127 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:1218] DB pointer 0x5634c2f60000
2019-04-26 17:56:08.127 7f4f2956ae00  1 bluestore(/var/lib/ceph/osd/ceph-126) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
2019-04-26 17:56:08.135 7f4f2956ae00  1 freelist init
2019-04-26 17:56:08.143 7f4f2956ae00  1 bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc opening allocation metadata
2019-04-26 17:56:08.147 7f4f2956ae00  1 bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc loaded 223 GiB in 233 extents
2019-04-26 17:56:08.151 7f4f2956ae00 -1 *** Caught signal (Aborted) **
 in thread 7f4f2956ae00 thread_name:ceph-osd

 ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
 1: (()+0x92b730) [0x5634c0151730]
 2: (()+0x12890) [0x7f4f1f02b890]
 3: (gsignal()+0xc7) [0x7f4f1df06e97]
 4: (abort()+0x141) [0x7f4f1df08801]
 5: (()+0x8c8b7) [0x7f4f1e8fb8b7]
 6: (()+0x92a06) [0x7f4f1e901a06]
 7: (()+0x92a41) [0x7f4f1e901a41]
 8: (()+0x92c74) [0x7f4f1e901c74]
 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1864) [0x7f4f20aff694]
 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f4f20b00af1]
 11: (OSDService::try_get_map(unsigned int)+0x508) [0x5634bfbf73a8]
 12: (OSDService::get_map(unsigned int)+0x1e) [0x5634bfc56ffe]
 13: (OSD::init()+0x1d5f) [0x5634bfc048ef]
 14: (main()+0x383d) [0x5634bfaef8cd]
 15: (__libc_start_main()+0xe7) [0x7f4f1dee9b97]
 16: (_start()+0x2a) [0x5634bfbb97aa]

We have seen this at least once in the past, and I suspect it might be related to high load (?) in the servers when lots of PGs are peering and/or large amounts of backfilling happens. In that case it was only a single disk, so we "fixed" it by just recreating that OSD - but this time we need to get them working to avoid losing metadata.

Based on previous posts to the mailing list and the bugtracker, I guessed this might be due to a corrupt osdmap for these OSDs. We managed to copy an osdmap from another OSD with ceph-objectstore-tool, but when trying to read the new osdmap again from the OSD (with get-osdmap) I still get an error:

root@storage05:/var/log/ceph# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-55 --op get-osdmap --file=/tmp/osdmap
terminate called after throwing an instance of 'ceph::buffer::malformed_input'
  what():  buffer::malformed_input: bad crc, actual 3828477398 != expected 3773790681
*** Caught signal (Aborted) **
 in thread 7f7611698f80 thread_name:ceph-objectstor
 ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
 1: (()+0x9895c0) [0x562e4aa3d5c0]
 2: (()+0x12890) [0x7f760715c890]
 3: (gsignal()+0xc7) [0x7f7606033e97]
 4: (abort()+0x141) [0x7f7606035801]
 5: (()+0x8c8b7) [0x7f7606a288b7]
 6: (()+0x92a06) [0x7f7606a2ea06]
 7: (()+0x92a41) [0x7f7606a2ea41]
 8: (()+0x92c74) [0x7f7606a2ec74]
 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1864) [0x7f7607af2694]
 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f7607af3af1]
 11: (get_osdmap(ObjectStore*, unsigned int, OSDMap&, ceph::buffer::list&)+0x1e5) [0x562e4a473655]
 12: (main()+0x39be) [0x562e4a3a43fe]
 13: (__libc_start_main()+0xe7) [0x7f7606016b97]
 14: (_start()+0x2a) [0x562e4a4727aa]
Aborted (core dumped)

Any clues or pointers what we should try? We have one nonimportant OSD where we can play around, but for the others I guess it makes sense to first try and duplicate (?) the data to avoid data loss when testing various things - pointers how to best do that would be equally welcome ;-) 

Cheers,

Erik

-- 
Erik Lindahl <erik.lindahl@xxxxxxxxx>
Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm University
Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com