On Tue, Apr 21, 2020 at 6:35 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote: > > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: > > > > Wait for recovery to finish so you know whether any data from the down > > OSDs is required. If not just reprovision them. > > Recovery will not finish from this state as several PGs are down and/or stale. What I meant was let recovery get as far as it can. > > > Paul > > > > > If data is required from the down OSDs you will need to run a query on > > the pg(s) to find out what OSDs have the required copies of the > > pg/object required. you can then export the pg from the down osd using > > the ceph-objectstore-tool, back it up, then import it back into the > > cluster. > > > > On Tue, Apr 21, 2020 at 1:05 AM Robert Sander > > <r.sander@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > one of our customers had his Ceph cluster crashed due to a power or network outage (they still try to figure out what happened). > > > > > > The cluster is very unhealthy but recovering: > > > > > > # ceph -s > > > cluster: > > > id: 1c95ca5d-948b-4113-9246-14761cb9a82a > > > health: HEALTH_ERR > > > 1 filesystem is degraded > > > 1 mds daemon damaged > > > 1 osds down > > > 1 pools have many more objects per pg than average > > > 1/115117480 objects unfound (0.000%) > > > Reduced data availability: 71 pgs inactive, 53 pgs down, 18 pgs peering, 27 pgs stale > > > Possible data damage: 1 pg recovery_unfound > > > Degraded data redundancy: 7303464/230234960 objects degraded (3.172%), 693 pgs degraded, 945 pgs undersized > > > 14 daemons have recently crashed > > > > > > services: > > > mon: 3 daemons, quorum maslxlabstore01,maslxlabstore02,maslxlabstore04 (age 64m) > > > mgr: maslxlabstore01(active, since 69m), standbys: maslxlabstore03, maslxlabstore02, maslxlabstore04 > > > mds: cephfs:2/3 {0=maslxlabstore03=up:resolve,1=maslxlabstore01=up:resolve} 2 up:standby, 1 damaged > > > osd: 140 osds: 130 up (since 4m), 131 in (since 4m); 847 remapped pgs > > > rgw: 4 daemons active (maslxlabstore01.rgw0, maslxlabstore02.rgw0, maslxlabstore03.rgw0, maslxlabstore04.rgw0) > > > > > > data: > > > pools: 6 pools, 8328 pgs > > > objects: 115.12M objects, 218 TiB > > > usage: 425 TiB used, 290 TiB / 715 TiB avail > > > pgs: 0.853% pgs not active > > > 7303464/230234960 objects degraded (3.172%) > > > 13486/230234960 objects misplaced (0.006%) > > > 1/115117480 objects unfound (0.000%) > > > 7311 active+clean > > > 338 active+undersized+degraded+remapped+backfill_wait > > > 255 active+undersized+degraded+remapped+backfilling > > > 215 active+undersized+remapped+backfilling > > > 99 active+undersized+degraded > > > 44 down > > > 37 active+undersized+remapped+backfill_wait > > > 13 stale+peering > > > 9 stale+down > > > 5 stale+remapped+peering > > > 1 active+recovery_unfound+undersized+degraded+remapped > > > 1 active+clean+remapped > > > > > > io: > > > client: 168 B/s rd, 0 B/s wr, 0 op/s rd, 0 op/s wr > > > recovery: 1.9 GiB/s, 15 keys/s, 948 objects/s > > > > > > > > > The MDS cluster is unable to start because one of them is damaged. > > > > > > 10 of the OSDs do not start. They crash very early in the boot process: > > > > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 set uid:gid to 64045:64045 (ceph:ceph) > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process ceph-osd, pid 69463 > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 pidfile_write: ignore empty --pid-file > > > 2020-04-20 16:26:15.503 7f818ec8cc00 0 starting osd.42 osd_data /var/lib/ceph/osd/ceph-42 /var/lib/ceph/osd/ceph-42/journal > > > 2020-04-20 16:26:15.523 7f818ec8cc00 0 load: jerasure load: lrc load: isa > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > > > 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > > > 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load lua > > > 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load kvs > > > 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load sdk > > > 2020-04-20 16:26:18.131 7f818ec8cc00 0 <cls> /build/ceph-14.2.9/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs > > > 2020-04-20 16:26:18.131 7f818ec8cc00 0 <cls> /build/ceph-14.2.9/src/cls/hello/cls_hello.cc:296: loading cls_hello > > > 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 288514051259236352, adjusting msgr requires for clients > > > 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons > > > 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 3314933000852226048, adjusting msgr requires for osds > > > 2020-04-20 16:26:22.023 7f818ec8cc00 0 osd.42 6008 load_pgs > > > 2020-04-20 16:26:22.499 7f818ec8cc00 0 osd.42 6008 load_pgs opened 109 pgs > > > 2020-04-20 16:26:22.499 7f818ec8cc00 0 osd.42 6008 using weightedpriority op queue with priority op cut off at 64. > > > 2020-04-20 16:26:22.499 7f818ec8cc00 -1 osd.42 6008 log_to_monitors {default=true} > > > 2020-04-20 16:26:22.511 7f818ec8cc00 0 osd.42 6008 done with init, starting boot process > > > 2020-04-20 16:26:23.883 7f815331c700 -1 /build/ceph-14.2.9/src/osd/PGLog.cc: In function 'void PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f815331c700 time 2020-04-20 16:26:23.884183 > > > /build/ceph-14.2.9/src/osd/PGLog.cc: 368: FAILED ceph_assert(log.head >= olog.tail && olog.head >= log.tail) > > > > > > ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) > > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x564fbd9349d2] > > > 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad] > > > 3: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930] > > > 4: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4] > > > 5: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47] > > > 6: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6] > > > 7: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21] > > > 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3] > > > 9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227] > > > 10: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12] > > > 11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54] > > > 12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0] > > > 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995] > > > 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc] > > > 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90] > > > 16: (()+0x76db) [0x7f818c8666db] > > > 17: (clone()+0x3f) [0x7f818b60688f] > > > > > > 2020-04-20 16:26:23.887 7f815331c700 -1 *** Caught signal (Aborted) ** > > > in thread 7f815331c700 thread_name:tp_osd_tp > > > > > > ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) > > > 1: (()+0x12890) [0x7f818c871890] > > > 2: (gsignal()+0xc7) [0x7f818b523e97] > > > 3: (abort()+0x141) [0x7f818b525801] > > > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x564fbd934a23] > > > 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad] > > > 6: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930] > > > 7: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4] > > > 8: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47] > > > 9: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6] > > > 10: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21] > > > 11: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3] > > > 12: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227] > > > 13: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12] > > > 14: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54] > > > 15: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0] > > > 16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995] > > > 17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc] > > > 18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90] > > > 19: (()+0x76db) [0x7f818c8666db] > > > 20: (clone()+0x3f) [0x7f818b60688f] > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > > > > > --- begin dump of recent events --- > > > -105> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command assert hook 0x564fc83ae510 > > > -104> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command abort hook 0x564fc83ae510 > > > -103> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perfcounters_dump hook 0x564fc83ae510 > > > -102> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command 1 hook 0x564fc83ae510 > > > -101> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf dump hook 0x564fc83ae510 > > > -100> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perfcounters_schema hook 0x564fc83ae510 > > > -99> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf histogram dump hook 0x564fc83ae510 > > > -98> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command 2 hook 0x564fc83ae510 > > > -97> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf schema hook 0x564fc83ae510 > > > -96> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf histogram schema hook 0x564fc83ae510 > > > -95> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command perf reset hook 0x564fc83ae510 > > > -94> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config show hook 0x564fc83ae510 > > > -93> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config help hook 0x564fc83ae510 > > > -92> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config set hook 0x564fc83ae510 > > > -91> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config unset hook 0x564fc83ae510 > > > -90> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config get hook 0x564fc83ae510 > > > -89> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config diff hook 0x564fc83ae510 > > > -88> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command config diff get hook 0x564fc83ae510 > > > -87> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command log flush hook 0x564fc83ae510 > > > -86> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command log dump hook 0x564fc83ae510 > > > -85> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command log reopen hook 0x564fc83ae510 > > > -84> 2020-04-20 16:26:14.923 7f818ec8cc00 5 asok(0x564fc8472000) register_command dump_mempools hook 0x564fc9076068 > > > -83> 2020-04-20 16:26:14.935 7f818ec8cc00 0 set uid:gid to 64045:64045 (ceph:ceph) > > > -82> 2020-04-20 16:26:14.935 7f818ec8cc00 0 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process ceph-osd, pid 69463 > > > -81> 2020-04-20 16:26:14.935 7f818ec8cc00 0 pidfile_write: ignore empty --pid-file > > > -80> 2020-04-20 16:26:15.503 7f818ec8cc00 0 starting osd.42 osd_data /var/lib/ceph/osd/ceph-42 /var/lib/ceph/osd/ceph-42/journal > > > -79> 2020-04-20 16:26:15.523 7f818ec8cc00 0 load: jerasure load: lrc load: isa > > > -78> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > > > -77> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > > > -76> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > > > -75> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > > > -74> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > > > -73> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > > > -72> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > > > -71> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > > > -70> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > > > -69> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > > > -68> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > > > -67> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > > > -66> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > > > -65> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > > > -64> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > > > -63> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > > > -62> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > > > -61> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > > > -60> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > > > -59> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > > > -58> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > > > -57> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > > > -56> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > > > -55> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > > > -54> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > > > -53> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > > > -52> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > > > -51> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > > > -50> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > > > -49> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > > > -48> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > > > -47> 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > > > -46> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > > > -45> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > > > -44> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > > > -43> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > > > -42> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > > > -41> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > > > -40> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > > > -39> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > > > -38> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > > > -37> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > > > -36> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > > > -35> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > > > -34> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > > > -33> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > > > -32> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > > > -31> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > > > -30> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_readahead_size = 2MB > > > -29> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_style = kCompactionStyleLevel > > > -28> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compaction_threads = 32 > > > -27> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option compression = kNoCompression > > > -26> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 > > > -25> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_file_num_compaction_trigger = 8 > > > -24> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = 32 > > > -23> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option level0_stop_writes_trigger = 64 > > > -22> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_background_compactions = 31 > > > -21> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_base = 536870912 > > > -20> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_bytes_for_level_multiplier = 8 > > > -19> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option max_write_buffer_number = 32 > > > -18> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option min_write_buffer_number_to_merge = 2 > > > -17> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option recycle_log_file_num = 32 > > > -16> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option target_file_size_base = 67108864 > > > -15> 2020-04-20 16:26:17.731 7f818ec8cc00 0 set rocksdb option write_buffer_size = 67108864 > > > -14> 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load lua > > > -13> 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load kvs > > > -12> 2020-04-20 16:26:18.131 7f818ec8cc00 0 _get_class not permitted to load sdk > > > -11> 2020-04-20 16:26:18.131 7f818ec8cc00 0 <cls> /build/ceph-14.2.9/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs > > > -10> 2020-04-20 16:26:18.131 7f818ec8cc00 0 <cls> /build/ceph-14.2.9/src/cls/hello/cls_hello.cc:296: loading cls_hello > > > -9> 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 288514051259236352, adjusting msgr requires for clients > > > -8> 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons > > > -7> 2020-04-20 16:26:18.131 7f818ec8cc00 0 osd.42 6008 crush map has features 3314933000852226048, adjusting msgr requires for osds > > > -6> 2020-04-20 16:26:22.023 7f818ec8cc00 0 osd.42 6008 load_pgs > > > -5> 2020-04-20 16:26:22.499 7f818ec8cc00 0 osd.42 6008 load_pgs opened 109 pgs > > > -4> 2020-04-20 16:26:22.499 7f818ec8cc00 0 osd.42 6008 using weightedpriority op queue with priority op cut off at 64. > > > -3> 2020-04-20 16:26:22.499 7f818ec8cc00 -1 osd.42 6008 log_to_monitors {default=true} > > > -2> 2020-04-20 16:26:22.511 7f818ec8cc00 0 osd.42 6008 done with init, starting boot process > > > -1> 2020-04-20 16:26:23.883 7f815331c700 -1 /build/ceph-14.2.9/src/osd/PGLog.cc: In function 'void PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f815331c700 time 2020-04-20 16:26:23.884183 > > > /build/ceph-14.2.9/src/osd/PGLog.cc: 368: FAILED ceph_assert(log.head >= olog.tail && olog.head >= log.tail) > > > > > > ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) > > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x564fbd9349d2] > > > 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad] > > > 3: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930] > > > 4: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4] > > > 5: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47] > > > 6: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6] > > > 7: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21] > > > 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3] > > > 9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227] > > > 10: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12] > > > 11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54] > > > 12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0] > > > 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995] > > > 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc] > > > 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90] > > > 16: (()+0x76db) [0x7f818c8666db] > > > 17: (clone()+0x3f) [0x7f818b60688f] > > > > > > 0> 2020-04-20 16:26:23.887 7f815331c700 -1 *** Caught signal (Aborted) ** > > > in thread 7f815331c700 thread_name:tp_osd_tp > > > > > > ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) > > > 1: (()+0x12890) [0x7f818c871890] > > > 2: (gsignal()+0xc7) [0x7f818b523e97] > > > 3: (abort()+0x141) [0x7f818b525801] > > > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x564fbd934a23] > > > 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad] > > > 6: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930] > > > 7: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4] > > > 8: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47] > > > 9: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6] > > > 10: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21] > > > 11: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3] > > > 12: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227] > > > 13: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12] > > > 14: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54] > > > 15: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0] > > > 16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995] > > > 17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc] > > > 18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90] > > > 19: (()+0x76db) [0x7f818c8666db] > > > 20: (clone()+0x3f) [0x7f818b60688f] > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > > > > > --- logging levels --- > > > 0/ 0 none > > > 0/ 0 lockdep > > > 0/ 0 context > > > 0/ 0 crush > > > 0/ 0 mds > > > 0/ 0 mds_balancer > > > 0/ 0 mds_locker > > > 0/ 0 mds_log > > > 0/ 0 mds_log_expire > > > 0/ 0 mds_migrator > > > 0/ 0 buffer > > > 0/ 0 timer > > > 0/ 0 filer > > > 0/ 0 striper > > > 0/ 0 objecter > > > 0/ 0 rados > > > 0/ 0 rbd > > > 0/ 0 rbd_mirror > > > 0/ 0 rbd_replay > > > 0/ 0 journaler > > > 0/ 0 objectcacher > > > 0/ 0 client > > > 0/ 0 osd > > > 0/ 0 optracker > > > 0/ 0 objclass > > > 0/ 0 filestore > > > 0/ 0 journal > > > 0/ 0 ms > > > 0/ 0 mon > > > 0/ 0 monc > > > 0/ 0 paxos > > > 0/ 0 tp > > > 0/ 0 auth > > > 0/ 0 crypto > > > 0/ 0 finisher > > > 0/ 0 reserver > > > 0/ 0 heartbeatmap > > > 0/ 0 perfcounter > > > 0/ 0 rgw > > > 1/ 5 rgw_sync > > > 0/ 0 civetweb > > > 0/ 0 javaclient > > > 0/ 0 asok > > > 0/ 0 throttle > > > 0/ 0 refs > > > 0/ 0 xio > > > 0/ 0 compressor > > > 0/ 0 bluestore > > > 0/ 0 bluefs > > > 0/ 0 bdev > > > 0/ 0 kstore > > > 0/ 0 rocksdb > > > 0/ 0 leveldb > > > 0/ 0 memdb > > > 0/ 0 kinetic > > > 0/ 0 fuse > > > 0/ 0 mgr > > > 0/ 0 mgrc > > > 0/ 0 dpdk > > > 0/ 0 eventtrace > > > 1/ 5 prioritycache > > > -2/-2 (syslog threshold) > > > -1/-1 (stderr threshold) > > > max_recent 10000 > > > max_new 1000 > > > log_file /var/log/ceph/ceph-osd.42.log > > > --- end dump of recent events --- > > > > > > It would be nice if anybody could give me a hint on where to look further. > > > > > > Regards > > > -- > > > Robert Sander > > > Heinlein Support GmbH > > > Schwedter Str. 8/9b, 10119 Berlin > > > > > > http://www.heinlein-support.de > > > > > > Tel: 030 / 405051-43 > > > Fax: 030 / 405051-19 > > > > > > Zwangsangaben lt. §35a GmbHG: > > > HRB 93818 B / Amtsgericht Berlin-Charlottenburg, > > > Geschäftsführer: Peer Heinlein -- Sitz: Berlin > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > -- > > Cheers, > > Brad > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Cheers, Brad _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx