Wait for recovery to finish so you know whether any data from the down OSDs is required. If not just reprovision them. If data is required from the down OSDs you will need to run a query on the pg(s) to find out what OSDs have the required copies of the pg/object required. you can then export the pg from the down osd using the ceph-objectstore-tool, back it up, then import it back into the cluster. On Tue, Apr 21, 2020 at 1:05 AM Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx> wrote: > > Hi, > > one of our customers had his Ceph cluster crashed due to a power or network outage (they still try to figure out what happened). > > The cluster is very unhealthy but recovering: > > # ceph -s > cluster: > id: 1c95ca5d-948b-4113-9246-14761cb9a82a > health: HEALTH_ERR > 1 filesystem is degraded > 1 mds daemon damaged > 1 osds down > 1 pools have many more objects per pg than average > 1/115117480 objects unfound (0.000%) > Reduced data availability: 71 pgs inactive, 53 pgs down, 18 pgs peering, 27 pgs stale > Possible data damage: 1 pg recovery_unfound > Degraded data redundancy: 7303464/230234960 objects degraded (3.172%), 693 pgs degraded, 945 pgs undersized > 14 daemons have recently crashed > > services: > mon: 3 daemons, quorum maslxlabstore01,maslxlabstore02,maslxlabstore04 (age 64m) > mgr: maslxlabstore01(active, since 69m), standbys: maslxlabstore03, maslxlabstore02, maslxlabstore04 > mds: cephfs:2/3 {0=maslxlabstore03=up:resolve,1=maslxlabstore01=up:resolve} 2 up:standby, 1 damaged > osd: 140 osds: 130 up (since 4m), 131 in (since 4m); 847 remapped pgs > rgw: 4 daemons active (maslxlabstore01.rgw0, maslxlabstore02.rgw0, maslxlabstore03.rgw0, maslxlabstore04.rgw0) > > data: > pools: 6 pools, 8328 pgs > objects: 115.12M objects, 218 TiB > usage: 425 TiB used, 290 TiB / 715 TiB avail > pgs: 0.853% pgs not active > 7303464/230234960 objects degraded (3.172%) > 13486/230234960 objects misplaced (0.006%) > 1/115117480 objects unfound (0.000%) > 7311 active+clean > 338 active+undersized+degraded+remapped+backfill_wait > 255 active+undersized+degraded+remapped+backfilling > 215 active+undersized+remapped+backfilling > 99 active+undersized+degraded > 44 down > 37 active+undersized+remapped+backfill_wait > 13 stale+peering > 9 stale+down > 5 stale+remapped+peering > 1 active+recovery_unfound+undersized+degraded+remapped > 1 active+clean+remapped > > io: > client: 168 B/s rd, 0 B/s wr, 0 op/s rd, 0 op/s wr > recovery: 1.9 GiB/s, 15 keys/s, 948 objects/s > > > The MDS cluster is unable to start because one of them is damaged. > > 10 of the OSDs do not start. They crash very early in the boot process: > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 set uid:gid to 64045:64045 (ceph:ceph) > 2020-04-20 16:26:14.935 7f818ec8cc00 0 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process ceph-osd, pid 69463 > 2020-04-20 16:26:14.935 7f818ec8cc00 0 pidfile_write: ignore empty --pid-file > 2020-04-20 16:26:15.503 7f818ec8cc00 0 starting osd.42 osd_data /var/lib/ceph/osd/ceph-42 /var/lib/ceph/osd/ceph-42/journal > 2020-04-20 16:26:15.523 7f818ec8cc00 0 load: jerasure load: lrc load: isa > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load lua > 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load kvs > 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load sdk > 2020-04-20 16:26:18.131 7f818ec8cc00 0 <cls> /build/ceph-14.2.9/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs > 2020-04-20 16:26:18.131 7f818ec8cc00 0 <cls> /build/ceph-14.2.9/src/cls/hello/cls_hello.cc:296: loading cls_hello > 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 288514051259236352, adjusting msgr requires for clients > 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons > 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 3314933000852226048, adjusting msgr requires for osds > 2020-04-20 16:26:22.023 7f818ec8cc00 0 osd.42 6008 load_pgs > 2020-04-20 16:26:22.499 7f818ec8cc00 0 osd.42 6008 load_pgs opened 109 pgs > 2020-04-20 16:26:22.499 7f818ec8cc00 0 osd.42 6008 using weightedpriority op queue with priority op cut off at 64. > 2020-04-20 16:26:22.499 7f818ec8cc00 -1 osd.42 6008 log_to_monitors {default=true} > 2020-04-20 16:26:22.511 7f818ec8cc00 0 osd.42 6008 done with init, starting boot process > 2020-04-20 16:26:23.883 7f815331c700 -1 /build/ceph-14.2.9/src/osd/PGLog.cc: In function 'void PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f815331c700 time 2020-04-20 16:26:23.884183 > /build/ceph-14.2.9/src/osd/PGLog.cc: 368: FAILED ceph_assert(log.head >= olog.tail && olog.head >= log.tail) > > ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x564fbd9349d2] > 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad] > 3: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930] > 4: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4] > 5: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47] > 6: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6] > 7: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21] > 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3] > 9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227] > 10: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12] > 11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54] > 12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0] > 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995] > 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc] > 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90] > 16: (()+0x76db) [0x7f818c8666db] > 17: (clone()+0x3f) [0x7f818b60688f] > > 2020-04-20 16:26:23.887 7f815331c700 -1 *** Caught signal (Aborted) ** > in thread 7f815331c700 thread_name:tp_osd_tp > > ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) > 1: (()+0x12890) [0x7f818c871890] > 2: (gsignal()+0xc7) [0x7f818b523e97] > 3: (abort()+0x141) [0x7f818b525801] > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x564fbd934a23] > 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad] > 6: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930] > 7: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4] > 8: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47] > 9: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6] > 10: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21] > 11: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3] > 12: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227] > 13: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12] > 14: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54] > 15: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0] > 16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995] > 17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc] > 18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90] > 19: (()+0x76db) [0x7f818c8666db] > 20: (clone()+0x3f) [0x7f818b60688f] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > --- begin dump of recent events --- > -105> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command assert hook 0x564fc83ae510 > -104> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command abort hook 0x564fc83ae510 > -103> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perfcounters_dump hook 0x564fc83ae510 > -102> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command 1 hook 0x564fc83ae510 > -101> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf dump hook 0x564fc83ae510 > -100> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perfcounters_schema hook 0x564fc83ae510 > -99> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf histogram dump hook 0x564fc83ae510 > -98> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command 2 hook 0x564fc83ae510 > -97> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf schema hook 0x564fc83ae510 > -96> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf histogram schema hook 0x564fc83ae510 > -95> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf reset hook 0x564fc83ae510 > -94> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config show hook 0x564fc83ae510 > -93> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config help hook 0x564fc83ae510 > -92> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config set hook 0x564fc83ae510 > -91> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config unset hook 0x564fc83ae510 > -90> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config get hook 0x564fc83ae510 > -89> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config diff hook 0x564fc83ae510 > -88> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config diff get hook 0x564fc83ae510 > -87> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command log flush hook 0x564fc83ae510 > -86> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command log dump hook 0x564fc83ae510 > -85> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command log reopen hook 0x564fc83ae510 > -84> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command dump_mempools hook 0x564fc9076068 > -83> 2020-04-20 16:26:14.935 7f818ec8cc00 0 set uid:gid to 64045:64045 (ceph:ceph) > -82> 2020-04-20 16:26:14.935 7f818ec8cc00 0 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process ceph-osd, pid 69463 > -81> 2020-04-20 16:26:14.935 7f818ec8cc00 0 pidfile_write: ignore empty --pid-file > -80> 2020-04-20 16:26:15.503 7f818ec8cc00 0 starting osd.42 osd_data /var/lib/ceph/osd/ceph-42 /var/lib/ceph/osd/ceph-42/journal > -79> 2020-04-20 16:26:15.523 7f818ec8cc00 0 load: jerasure load: lrc load: isa > -78> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > -77> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > -76> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > -75> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > -74> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > -73> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > -72> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > -71> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > -70> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > -69> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > -68> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > -67> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > -66> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > -65> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > -64> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > -63> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > -62> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > -61> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > -60> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > -59> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > -58> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > -57> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > -56> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > -55> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > -54> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > -53> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > -52> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > -51> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > -50> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > -49> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > -48> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > -47> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > -46> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > -45> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > -44> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > -43> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > -42> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > -41> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > -40> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > -39> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > -38> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > -37> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > -36> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > -35> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > -34> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > -33> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > -32> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > -31> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > -30> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > -29> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > -28> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > -27> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > -26> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > -25> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > -24> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > -23> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > -22> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > -21> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > -20> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > -19> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > -18> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > -17> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > -16> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > -15> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > -14> 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load lua > -13> 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load kvs > -12> 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load sdk > -11> 2020-04-20 16:26:18.131 7f818ec8cc00 0 <cls> /build/ceph-14.2.9/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs > -10> 2020-04-20 16:26:18.131 7f818ec8cc00 0 <cls> /build/ceph-14.2.9/src/cls/hello/cls_hello.cc:296: loading cls_hello > -9> 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 288514051259236352, adjusting msgr requires for clients > -8> 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons > -7> 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 3314933000852226048, adjusting msgr requires for osds > -6> 2020-04-20 16:26:22.023 7f818ec8cc00 0 osd.42 6008 load_pgs > -5> 2020-04-20 16:26:22.499 7f818ec8cc00 0 osd.42 6008 load_pgs opened 109 pgs > -4> 2020-04-20 16:26:22.499 7f818ec8cc00 0 osd.42 6008 using weightedpriority op queue with priority op cut off at 64. > -3> 2020-04-20 16:26:22.499 7f818ec8cc00 -1 osd.42 6008 log_to_monitors {default=true} > -2> 2020-04-20 16:26:22.511 7f818ec8cc00 0 osd.42 6008 done with init, starting boot process > -1> 2020-04-20 16:26:23.883 7f815331c700 -1 /build/ceph-14.2.9/src/osd/PGLog.cc: In function 'void PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f815331c700 time 2020-04-20 16:26:23.884183 > /build/ceph-14.2.9/src/osd/PGLog.cc: 368: FAILED ceph_assert(log.head >= olog.tail && olog.head >= log.tail) > > ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x564fbd9349d2] > 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad] > 3: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930] > 4: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4] > 5: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47] > 6: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6] > 7: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21] > 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3] > 9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227] > 10: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12] > 11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54] > 12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0] > 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995] > 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc] > 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90] > 16: (()+0x76db) [0x7f818c8666db] > 17: (clone()+0x3f) [0x7f818b60688f] > > 0> 2020-04-20 16:26:23.887 7f815331c700 -1 *** Caught signal (Aborted) ** > in thread 7f815331c700 thread_name:tp_osd_tp > > ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) > 1: (()+0x12890) [0x7f818c871890] > 2: (gsignal()+0xc7) [0x7f818b523e97] > 3: (abort()+0x141) [0x7f818b525801] > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x564fbd934a23] > 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad] > 6: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930] > 7: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4] > 8: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47] > 9: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6] > 10: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21] > 11: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3] > 12: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227] > 13: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12] > 14: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54] > 15: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0] > 16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995] > 17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc] > 18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90] > 19: (()+0x76db) [0x7f818c8666db] > 20: (clone()+0x3f) [0x7f818b60688f] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > --- logging levels --- > 0/ 0 none > 0/ 0 lockdep > 0/ 0 context > 0/ 0 crush > 0/ 0 mds > 0/ 0 mds_balancer > 0/ 0 mds_locker > 0/ 0 mds_log > 0/ 0 mds_log_expire > 0/ 0 mds_migrator > 0/ 0 buffer > 0/ 0 timer > 0/ 0 filer > 0/ 0 striper > 0/ 0 objecter > 0/ 0 rados > 0/ 0 rbd > 0/ 0 rbd_mirror > 0/ 0 rbd_replay > 0/ 0 journaler > 0/ 0 objectcacher > 0/ 0 client > 0/ 0 osd > 0/ 0 optracker > 0/ 0 objclass > 0/ 0 filestore > 0/ 0 journal > 0/ 0 ms > 0/ 0 mon > 0/ 0 monc > 0/ 0 paxos > 0/ 0 tp > 0/ 0 auth > 0/ 0 crypto > 0/ 0 finisher > 0/ 0 reserver > 0/ 0 heartbeatmap > 0/ 0 perfcounter > 0/ 0 rgw > 1/ 5 rgw_sync > 0/ 0 civetweb > 0/ 0 javaclient > 0/ 0 asok > 0/ 0 throttle > 0/ 0 refs > 0/ 0 xio > 0/ 0 compressor > 0/ 0 bluestore > 0/ 0 bluefs > 0/ 0 bdev > 0/ 0 kstore > 0/ 0 rocksdb > 0/ 0 leveldb > 0/ 0 memdb > 0/ 0 kinetic > 0/ 0 fuse > 0/ 0 mgr > 0/ 0 mgrc > 0/ 0 dpdk > 0/ 0 eventtrace > 1/ 5 prioritycache > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.42.log > --- end dump of recent events --- > > It would be nice if anybody could give me a hint on where to look further. > > Regards > -- > Robert Sander > Heinlein Support GmbH > Schwedter Str. 8/9b, 10119 Berlin > > http://www.heinlein-support.de > > Tel: 030 / 405051-43 > Fax: 030 / 405051-19 > > Zwangsangaben lt. §35a GmbHG: > HRB 93818 B / Amtsgericht Berlin-Charlottenburg, > Geschäftsführer: Peer Heinlein -- Sitz: Berlin > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Cheers, Brad _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx