Mimic/13.2.5 bluestore OSDs crashing during startup in OSDMap::decode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,

In conjunction with taking a new storage server online we observed that a whole bunch of the SSD OSDs we use for metadata went offline, and crash every time they try to restart with an abort signal in OSDMap::decode - brief log below.

We have seen this at least once in the past, and I suspect it might be related to high load (?) in the servers when lots of PGs are peering and/or large amounts of backfilling happens. In that case it was only a single disk, so we "fixed" it by just recreating that OSD - but this time we need to get them working to avoid losing metadata :-)


Based on previous posts to the mailing list and the bugtracker, I would guess this might be due to a corrupt osdmap for these OSDs. 

Should we try to replace the osdmap, and if so: how do we do that for bluestore OSDs?

Cheers,

Erik


2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3362] Recovered from manifest file:db/MANIFEST-001493 succeeded,manifest_file_number is 1493, next_file_number is 1496, last_sequence is 45904669, log_number is 0,prev_log_number is 0,max_column_family is 0,deleted_log_number is 1491

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3370] Column family [default] (ID 0), log number is 1492

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556294168125624, "job": 1, "event": "recovery_started", "log_files": [1494]}
2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:551] Recovering log #1494 mode 0
2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:2863] Creating manifest 1496

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556294168126875, "job": 1, "event": "recovery_finished"}
2019-04-26 17:56:08.127 7f4f2956ae00  4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:1218] DB pointer 0x5634c2f60000
2019-04-26 17:56:08.127 7f4f2956ae00  1 bluestore(/var/lib/ceph/osd/ceph-126) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
2019-04-26 17:56:08.135 7f4f2956ae00  1 freelist init
2019-04-26 17:56:08.143 7f4f2956ae00  1 bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc opening allocation metadata
2019-04-26 17:56:08.147 7f4f2956ae00  1 bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc loaded 223 GiB in 233 extents
2019-04-26 17:56:08.151 7f4f2956ae00 -1 *** Caught signal (Aborted) **
 in thread 7f4f2956ae00 thread_name:ceph-osd

 ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
 1: (()+0x92b730) [0x5634c0151730]
 2: (()+0x12890) [0x7f4f1f02b890]
 3: (gsignal()+0xc7) [0x7f4f1df06e97]
 4: (abort()+0x141) [0x7f4f1df08801]
 5: (()+0x8c8b7) [0x7f4f1e8fb8b7]
 6: (()+0x92a06) [0x7f4f1e901a06]
 7: (()+0x92a41) [0x7f4f1e901a41]
 8: (()+0x92c74) [0x7f4f1e901c74]
 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1864) [0x7f4f20aff694]
 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f4f20b00af1]
 11: (OSDService::try_get_map(unsigned int)+0x508) [0x5634bfbf73a8]
 12: (OSDService::get_map(unsigned int)+0x1e) [0x5634bfc56ffe]
 13: (OSD::init()+0x1d5f) [0x5634bfc048ef]
 14: (main()+0x383d) [0x5634bfaef8cd]
 15: (__libc_start_main()+0xe7) [0x7f4f1dee9b97]
 16: (_start()+0x2a) [0x5634bfbb97aa]

Cheers,

Erik

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux