Re: Nautilus cluster damaged + crashing OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 21, 2020 at 6:35 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
>
> On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
> >
> > Wait for recovery to finish so you know whether any data from the down
> > OSDs is required. If not just reprovision them.
>
> Recovery will not finish from this state as several PGs are down and/or stale.

What I meant was let recovery get as far as it can.

>
>
> Paul
>
> >
> > If data is required from the down OSDs you will need to run a query on
> > the pg(s) to find out what OSDs have the required copies of the
> > pg/object required. you can then export the pg from the down osd using
> > the ceph-objectstore-tool, back it up, then import it back into the
> > cluster.
> >
> > On Tue, Apr 21, 2020 at 1:05 AM Robert Sander
> > <r.sander@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > one of our customers had his Ceph cluster crashed due to a power or network outage (they still try to figure out what happened).
> > >
> > > The cluster is very unhealthy but recovering:
> > >
> > > # ceph -s
> > >   cluster:
> > >     id:     1c95ca5d-948b-4113-9246-14761cb9a82a
> > >     health: HEALTH_ERR
> > >             1 filesystem is degraded
> > >             1 mds daemon damaged
> > >             1 osds down
> > >             1 pools have many more objects per pg than average
> > >             1/115117480 objects unfound (0.000%)
> > >             Reduced data availability: 71 pgs inactive, 53 pgs down, 18 pgs peering, 27 pgs stale
> > >             Possible data damage: 1 pg recovery_unfound
> > >             Degraded data redundancy: 7303464/230234960 objects degraded (3.172%), 693 pgs degraded, 945 pgs undersized
> > >             14 daemons have recently crashed
> > >
> > >   services:
> > >     mon: 3 daemons, quorum maslxlabstore01,maslxlabstore02,maslxlabstore04 (age 64m)
> > >     mgr: maslxlabstore01(active, since 69m), standbys: maslxlabstore03, maslxlabstore02, maslxlabstore04
> > >     mds: cephfs:2/3 {0=maslxlabstore03=up:resolve,1=maslxlabstore01=up:resolve} 2 up:standby, 1 damaged
> > >     osd: 140 osds: 130 up (since 4m), 131 in (since 4m); 847 remapped pgs
> > >     rgw: 4 daemons active (maslxlabstore01.rgw0, maslxlabstore02.rgw0, maslxlabstore03.rgw0, maslxlabstore04.rgw0)
> > >
> > >   data:
> > >     pools:   6 pools, 8328 pgs
> > >     objects: 115.12M objects, 218 TiB
> > >     usage:   425 TiB used, 290 TiB / 715 TiB avail
> > >     pgs:     0.853% pgs not active
> > >              7303464/230234960 objects degraded (3.172%)
> > >              13486/230234960 objects misplaced (0.006%)
> > >              1/115117480 objects unfound (0.000%)
> > >              7311 active+clean
> > >              338  active+undersized+degraded+remapped+backfill_wait
> > >              255  active+undersized+degraded+remapped+backfilling
> > >              215  active+undersized+remapped+backfilling
> > >              99   active+undersized+degraded
> > >              44   down
> > >              37   active+undersized+remapped+backfill_wait
> > >              13   stale+peering
> > >              9    stale+down
> > >              5    stale+remapped+peering
> > >              1    active+recovery_unfound+undersized+degraded+remapped
> > >              1    active+clean+remapped
> > >
> > >   io:
> > >     client:   168 B/s rd, 0 B/s wr, 0 op/s rd, 0 op/s wr
> > >     recovery: 1.9 GiB/s, 15 keys/s, 948 objects/s
> > >
> > >
> > > The MDS cluster is unable to start because one of them is damaged.
> > >
> > > 10 of the OSDs do not start. They crash very early in the boot process:
> > >
> > > 2020-04-20 16:26:14.935 7f818ec8cc00  0 set uid:gid to 64045:64045 (ceph:ceph)
> > > 2020-04-20 16:26:14.935 7f818ec8cc00  0 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process ceph-osd, pid 69463
> > > 2020-04-20 16:26:14.935 7f818ec8cc00  0 pidfile_write: ignore empty --pid-file
> > > 2020-04-20 16:26:15.503 7f818ec8cc00  0 starting osd.42 osd_data /var/lib/ceph/osd/ceph-42 /var/lib/ceph/osd/ceph-42/journal
> > > 2020-04-20 16:26:15.523 7f818ec8cc00  0 load: jerasure load: lrc load: isa
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_readahead_size = 2MB
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_style = kCompactionStyleLevel
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_threads = 32
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compression = kNoCompression
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option flusher_threads = 8
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_file_num_compaction_trigger = 8
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_slowdown_writes_trigger = 32
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_stop_writes_trigger = 64
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_background_compactions = 31
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_base = 536870912
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_multiplier = 8
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_write_buffer_number = 32
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option min_write_buffer_number_to_merge = 2
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option recycle_log_file_num = 32
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option target_file_size_base = 67108864
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option write_buffer_size = 67108864
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_readahead_size = 2MB
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_style = kCompactionStyleLevel
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_threads = 32
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compression = kNoCompression
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option flusher_threads = 8
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_file_num_compaction_trigger = 8
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_slowdown_writes_trigger = 32
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_stop_writes_trigger = 64
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_background_compactions = 31
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_base = 536870912
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_multiplier = 8
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_write_buffer_number = 32
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option min_write_buffer_number_to_merge = 2
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option recycle_log_file_num = 32
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option target_file_size_base = 67108864
> > > 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option write_buffer_size = 67108864
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_readahead_size = 2MB
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_style = kCompactionStyleLevel
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_threads = 32
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compression = kNoCompression
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option flusher_threads = 8
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_file_num_compaction_trigger = 8
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_slowdown_writes_trigger = 32
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_stop_writes_trigger = 64
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_background_compactions = 31
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_base = 536870912
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_multiplier = 8
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_write_buffer_number = 32
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option min_write_buffer_number_to_merge = 2
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option recycle_log_file_num = 32
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option target_file_size_base = 67108864
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option write_buffer_size = 67108864
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_readahead_size = 2MB
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_style = kCompactionStyleLevel
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_threads = 32
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compression = kNoCompression
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option flusher_threads = 8
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_file_num_compaction_trigger = 8
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_slowdown_writes_trigger = 32
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_stop_writes_trigger = 64
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_background_compactions = 31
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_base = 536870912
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_multiplier = 8
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_write_buffer_number = 32
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option min_write_buffer_number_to_merge = 2
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option recycle_log_file_num = 32
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option target_file_size_base = 67108864
> > > 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option write_buffer_size = 67108864
> > > 2020-04-20 16:26:18.131 7f818ec8cc00  0 _get_class not permitted to load lua
> > > 2020-04-20 16:26:18.131 7f818ec8cc00  0 _get_class not permitted to load kvs
> > > 2020-04-20 16:26:18.131 7f818ec8cc00  0 _get_class not permitted to load sdk
> > > 2020-04-20 16:26:18.131 7f818ec8cc00  0 <cls> /build/ceph-14.2.9/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs
> > > 2020-04-20 16:26:18.131 7f818ec8cc00  0 <cls> /build/ceph-14.2.9/src/cls/hello/cls_hello.cc:296: loading cls_hello
> > > 2020-04-20 16:26:18.131 7f818ec8cc00  0 osd.42 6008 crush map has features 288514051259236352, adjusting msgr requires for clients
> > > 2020-04-20 16:26:18.131 7f818ec8cc00  0 osd.42 6008 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons
> > > 2020-04-20 16:26:18.131 7f818ec8cc00  0 osd.42 6008 crush map has features 3314933000852226048, adjusting msgr requires for osds
> > > 2020-04-20 16:26:22.023 7f818ec8cc00  0 osd.42 6008 load_pgs
> > > 2020-04-20 16:26:22.499 7f818ec8cc00  0 osd.42 6008 load_pgs opened 109 pgs
> > > 2020-04-20 16:26:22.499 7f818ec8cc00  0 osd.42 6008 using weightedpriority op queue with priority op cut off at 64.
> > > 2020-04-20 16:26:22.499 7f818ec8cc00 -1 osd.42 6008 log_to_monitors {default=true}
> > > 2020-04-20 16:26:22.511 7f818ec8cc00  0 osd.42 6008 done with init, starting boot process
> > > 2020-04-20 16:26:23.883 7f815331c700 -1 /build/ceph-14.2.9/src/osd/PGLog.cc: In function 'void PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f815331c700 time 2020-04-20 16:26:23.884183
> > > /build/ceph-14.2.9/src/osd/PGLog.cc: 368: FAILED ceph_assert(log.head >= olog.tail && olog.head >= log.tail)
> > >
> > >  ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
> > >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x564fbd9349d2]
> > >  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad]
> > >  3: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930]
> > >  4: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4]
> > >  5: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47]
> > >  6: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6]
> > >  7: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21]
> > >  8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3]
> > >  9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227]
> > >  10: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12]
> > >  11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54]
> > >  12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0]
> > >  13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995]
> > >  14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc]
> > >  15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90]
> > >  16: (()+0x76db) [0x7f818c8666db]
> > >  17: (clone()+0x3f) [0x7f818b60688f]
> > >
> > > 2020-04-20 16:26:23.887 7f815331c700 -1 *** Caught signal (Aborted) **
> > >  in thread 7f815331c700 thread_name:tp_osd_tp
> > >
> > >  ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
> > >  1: (()+0x12890) [0x7f818c871890]
> > >  2: (gsignal()+0xc7) [0x7f818b523e97]
> > >  3: (abort()+0x141) [0x7f818b525801]
> > >  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x564fbd934a23]
> > >  5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad]
> > >  6: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930]
> > >  7: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4]
> > >  8: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47]
> > >  9: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6]
> > >  10: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21]
> > >  11: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3]
> > >  12: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227]
> > >  13: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12]
> > >  14: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54]
> > >  15: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0]
> > >  16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995]
> > >  17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc]
> > >  18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90]
> > >  19: (()+0x76db) [0x7f818c8666db]
> > >  20: (clone()+0x3f) [0x7f818b60688f]
> > >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> > >
> > > --- begin dump of recent events ---
> > >   -105> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command assert hook 0x564fc83ae510
> > >   -104> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command abort hook 0x564fc83ae510
> > >   -103> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command perfcounters_dump hook 0x564fc83ae510
> > >   -102> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command 1 hook 0x564fc83ae510
> > >   -101> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command perf dump hook 0x564fc83ae510
> > >   -100> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command perfcounters_schema hook 0x564fc83ae510
> > >    -99> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command perf histogram dump hook 0x564fc83ae510
> > >    -98> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command 2 hook 0x564fc83ae510
> > >    -97> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command perf schema hook 0x564fc83ae510
> > >    -96> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command perf histogram schema hook 0x564fc83ae510
> > >    -95> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command perf reset hook 0x564fc83ae510
> > >    -94> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command config show hook 0x564fc83ae510
> > >    -93> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command config help hook 0x564fc83ae510
> > >    -92> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command config set hook 0x564fc83ae510
> > >    -91> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command config unset hook 0x564fc83ae510
> > >    -90> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command config get hook 0x564fc83ae510
> > >    -89> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command config diff hook 0x564fc83ae510
> > >    -88> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command config diff get hook 0x564fc83ae510
> > >    -87> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command log flush hook 0x564fc83ae510
> > >    -86> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command log dump hook 0x564fc83ae510
> > >    -85> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command log reopen hook 0x564fc83ae510
> > >    -84> 2020-04-20 16:26:14.923 7f818ec8cc00  5 asok(0x564fc8472000) register_command dump_mempools hook 0x564fc9076068
> > >    -83> 2020-04-20 16:26:14.935 7f818ec8cc00  0 set uid:gid to 64045:64045 (ceph:ceph)
> > >    -82> 2020-04-20 16:26:14.935 7f818ec8cc00  0 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process ceph-osd, pid 69463
> > >    -81> 2020-04-20 16:26:14.935 7f818ec8cc00  0 pidfile_write: ignore empty --pid-file
> > >    -80> 2020-04-20 16:26:15.503 7f818ec8cc00  0 starting osd.42 osd_data /var/lib/ceph/osd/ceph-42 /var/lib/ceph/osd/ceph-42/journal
> > >    -79> 2020-04-20 16:26:15.523 7f818ec8cc00  0 load: jerasure load: lrc load: isa
> > >    -78> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_readahead_size = 2MB
> > >    -77> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_style = kCompactionStyleLevel
> > >    -76> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_threads = 32
> > >    -75> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compression = kNoCompression
> > >    -74> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option flusher_threads = 8
> > >    -73> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_file_num_compaction_trigger = 8
> > >    -72> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_slowdown_writes_trigger = 32
> > >    -71> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_stop_writes_trigger = 64
> > >    -70> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_background_compactions = 31
> > >    -69> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_base = 536870912
> > >    -68> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_multiplier = 8
> > >    -67> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_write_buffer_number = 32
> > >    -66> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option min_write_buffer_number_to_merge = 2
> > >    -65> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option recycle_log_file_num = 32
> > >    -64> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option target_file_size_base = 67108864
> > >    -63> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option write_buffer_size = 67108864
> > >    -62> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_readahead_size = 2MB
> > >    -61> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_style = kCompactionStyleLevel
> > >    -60> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compaction_threads = 32
> > >    -59> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option compression = kNoCompression
> > >    -58> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option flusher_threads = 8
> > >    -57> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_file_num_compaction_trigger = 8
> > >    -56> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_slowdown_writes_trigger = 32
> > >    -55> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option level0_stop_writes_trigger = 64
> > >    -54> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_background_compactions = 31
> > >    -53> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_base = 536870912
> > >    -52> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_multiplier = 8
> > >    -51> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option max_write_buffer_number = 32
> > >    -50> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option min_write_buffer_number_to_merge = 2
> > >    -49> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option recycle_log_file_num = 32
> > >    -48> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option target_file_size_base = 67108864
> > >    -47> 2020-04-20 16:26:16.339 7f818ec8cc00  0  set rocksdb option write_buffer_size = 67108864
> > >    -46> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_readahead_size = 2MB
> > >    -45> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_style = kCompactionStyleLevel
> > >    -44> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_threads = 32
> > >    -43> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compression = kNoCompression
> > >    -42> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option flusher_threads = 8
> > >    -41> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_file_num_compaction_trigger = 8
> > >    -40> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_slowdown_writes_trigger = 32
> > >    -39> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_stop_writes_trigger = 64
> > >    -38> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_background_compactions = 31
> > >    -37> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_base = 536870912
> > >    -36> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_multiplier = 8
> > >    -35> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_write_buffer_number = 32
> > >    -34> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option min_write_buffer_number_to_merge = 2
> > >    -33> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option recycle_log_file_num = 32
> > >    -32> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option target_file_size_base = 67108864
> > >    -31> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option write_buffer_size = 67108864
> > >    -30> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_readahead_size = 2MB
> > >    -29> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_style = kCompactionStyleLevel
> > >    -28> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compaction_threads = 32
> > >    -27> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option compression = kNoCompression
> > >    -26> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option flusher_threads = 8
> > >    -25> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_file_num_compaction_trigger = 8
> > >    -24> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_slowdown_writes_trigger = 32
> > >    -23> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option level0_stop_writes_trigger = 64
> > >    -22> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_background_compactions = 31
> > >    -21> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_base = 536870912
> > >    -20> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_bytes_for_level_multiplier = 8
> > >    -19> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option max_write_buffer_number = 32
> > >    -18> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option min_write_buffer_number_to_merge = 2
> > >    -17> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option recycle_log_file_num = 32
> > >    -16> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option target_file_size_base = 67108864
> > >    -15> 2020-04-20 16:26:17.731 7f818ec8cc00  0  set rocksdb option write_buffer_size = 67108864
> > >    -14> 2020-04-20 16:26:18.131 7f818ec8cc00  0 _get_class not permitted to load lua
> > >    -13> 2020-04-20 16:26:18.131 7f818ec8cc00  0 _get_class not permitted to load kvs
> > >    -12> 2020-04-20 16:26:18.131 7f818ec8cc00  0 _get_class not permitted to load sdk
> > >    -11> 2020-04-20 16:26:18.131 7f818ec8cc00  0 <cls> /build/ceph-14.2.9/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs
> > >    -10> 2020-04-20 16:26:18.131 7f818ec8cc00  0 <cls> /build/ceph-14.2.9/src/cls/hello/cls_hello.cc:296: loading cls_hello
> > >     -9> 2020-04-20 16:26:18.131 7f818ec8cc00  0 osd.42 6008 crush map has features 288514051259236352, adjusting msgr requires for clients
> > >     -8> 2020-04-20 16:26:18.131 7f818ec8cc00  0 osd.42 6008 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons
> > >     -7> 2020-04-20 16:26:18.131 7f818ec8cc00  0 osd.42 6008 crush map has features 3314933000852226048, adjusting msgr requires for osds
> > >     -6> 2020-04-20 16:26:22.023 7f818ec8cc00  0 osd.42 6008 load_pgs
> > >     -5> 2020-04-20 16:26:22.499 7f818ec8cc00  0 osd.42 6008 load_pgs opened 109 pgs
> > >     -4> 2020-04-20 16:26:22.499 7f818ec8cc00  0 osd.42 6008 using weightedpriority op queue with priority op cut off at 64.
> > >     -3> 2020-04-20 16:26:22.499 7f818ec8cc00 -1 osd.42 6008 log_to_monitors {default=true}
> > >     -2> 2020-04-20 16:26:22.511 7f818ec8cc00  0 osd.42 6008 done with init, starting boot process
> > >     -1> 2020-04-20 16:26:23.883 7f815331c700 -1 /build/ceph-14.2.9/src/osd/PGLog.cc: In function 'void PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f815331c700 time 2020-04-20 16:26:23.884183
> > > /build/ceph-14.2.9/src/osd/PGLog.cc: 368: FAILED ceph_assert(log.head >= olog.tail && olog.head >= log.tail)
> > >
> > >  ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
> > >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x564fbd9349d2]
> > >  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad]
> > >  3: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930]
> > >  4: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4]
> > >  5: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47]
> > >  6: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6]
> > >  7: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21]
> > >  8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3]
> > >  9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227]
> > >  10: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12]
> > >  11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54]
> > >  12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0]
> > >  13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995]
> > >  14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc]
> > >  15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90]
> > >  16: (()+0x76db) [0x7f818c8666db]
> > >  17: (clone()+0x3f) [0x7f818b60688f]
> > >
> > >      0> 2020-04-20 16:26:23.887 7f815331c700 -1 *** Caught signal (Aborted) **
> > >  in thread 7f815331c700 thread_name:tp_osd_tp
> > >
> > >  ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
> > >  1: (()+0x12890) [0x7f818c871890]
> > >  2: (gsignal()+0xc7) [0x7f818b523e97]
> > >  3: (abort()+0x141) [0x7f818b525801]
> > >  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x564fbd934a23]
> > >  5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564fbd934bad]
> > >  6: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1cc0) [0x564fbdaff930]
> > >  7: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x564fbda4eca4]
> > >  8: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x97) [0x564fbda7fe47]
> > >  9: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xa6) [0x564fbda9d4f6]
> > >  10: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x191) [0x564fbdaf0e21]
> > >  11: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x564fbdabdfc3]
> > >  12: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x87) [0x564fbdabe227]
> > >  13: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x122) [0x564fbdaada12]
> > >  14: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x564fbd9e1f54]
> > >  15: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x564fbdc710c0]
> > >  16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x564fbd9d5995]
> > >  17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x564fbdfdb8cc]
> > >  18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564fbdfdea90]
> > >  19: (()+0x76db) [0x7f818c8666db]
> > >  20: (clone()+0x3f) [0x7f818b60688f]
> > >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> > >
> > > --- logging levels ---
> > >    0/ 0 none
> > >    0/ 0 lockdep
> > >    0/ 0 context
> > >    0/ 0 crush
> > >    0/ 0 mds
> > >    0/ 0 mds_balancer
> > >    0/ 0 mds_locker
> > >    0/ 0 mds_log
> > >    0/ 0 mds_log_expire
> > >    0/ 0 mds_migrator
> > >    0/ 0 buffer
> > >    0/ 0 timer
> > >    0/ 0 filer
> > >    0/ 0 striper
> > >    0/ 0 objecter
> > >    0/ 0 rados
> > >    0/ 0 rbd
> > >    0/ 0 rbd_mirror
> > >    0/ 0 rbd_replay
> > >    0/ 0 journaler
> > >    0/ 0 objectcacher
> > >    0/ 0 client
> > >    0/ 0 osd
> > >    0/ 0 optracker
> > >    0/ 0 objclass
> > >    0/ 0 filestore
> > >    0/ 0 journal
> > >    0/ 0 ms
> > >    0/ 0 mon
> > >    0/ 0 monc
> > >    0/ 0 paxos
> > >    0/ 0 tp
> > >    0/ 0 auth
> > >    0/ 0 crypto
> > >    0/ 0 finisher
> > >    0/ 0 reserver
> > >    0/ 0 heartbeatmap
> > >    0/ 0 perfcounter
> > >    0/ 0 rgw
> > >    1/ 5 rgw_sync
> > >    0/ 0 civetweb
> > >    0/ 0 javaclient
> > >    0/ 0 asok
> > >    0/ 0 throttle
> > >    0/ 0 refs
> > >    0/ 0 xio
> > >    0/ 0 compressor
> > >    0/ 0 bluestore
> > >    0/ 0 bluefs
> > >    0/ 0 bdev
> > >    0/ 0 kstore
> > >    0/ 0 rocksdb
> > >    0/ 0 leveldb
> > >    0/ 0 memdb
> > >    0/ 0 kinetic
> > >    0/ 0 fuse
> > >    0/ 0 mgr
> > >    0/ 0 mgrc
> > >    0/ 0 dpdk
> > >    0/ 0 eventtrace
> > >    1/ 5 prioritycache
> > >   -2/-2 (syslog threshold)
> > >   -1/-1 (stderr threshold)
> > >   max_recent     10000
> > >   max_new         1000
> > >   log_file /var/log/ceph/ceph-osd.42.log
> > > --- end dump of recent events ---
> > >
> > > It would be nice if anybody could give me a hint on where to look further.
> > >
> > > Regards
> > > --
> > > Robert Sander
> > > Heinlein Support GmbH
> > > Schwedter Str. 8/9b, 10119 Berlin
> > >
> > > http://www.heinlein-support.de
> > >
> > > Tel: 030 / 405051-43
> > > Fax: 030 / 405051-19
> > >
> > > Zwangsangaben lt. §35a GmbHG:
> > > HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> > > Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> > >
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> >
> > --
> > Cheers,
> > Brad
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux