Hi Sage I find the osd assert due to the wrongly generated past_intervals, could you give me some advice and solutions to this problem? Here is the detail: Ceph version: 0.94.5 HOST-A HOST-B HOST-C osd 7 osd 21
osd11 1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up set[11,7],then shutdown HOST-C
2. for a long time, cluster has only HOST A and HOST B, write data
3. shutdown HOST-A , then start HOST-C, restart HOST-B about 4 times 4. start HOST-A, osd 21 assert Analysis: when osd 11 start, it generate past_intervals wrongly, make [92~1000] in the same interval pg map 1673,osd11 become the primary,and pg 1.20f change from peering to activating+undersized+degraded , modified last_epoch_start; osd7 start, find_best_info will choose out bigger last_epoch_start,althought osd7 has the latest data; past_intervals on osd 7: ~95 [11,7]/[11,7] 96~100 [7]/[7] 101 [7,21]/[7,21] 102~178 [7,21]/[7] 179~1663 [7,21]/[7,21] 1664~1672 [21]/[21] 1673~1692 [11]/[11] past_intervals on osd11: 92~1000 [11,7]/[11,7]----------------the wrong pi 1001~1663 [7,21]/[7,21] no rw 1664~1672 [21]/[21] no rw 1673~1692 [11]/[11] Logs: Assert on osd7: 2017-07-10 16:08:29.836722 7f4fac24a700 -1 osd/PGLog.cc: In function 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f4fac24a700 time
2017-07-10 16:08:29.833699 osd/PGLog.cc: 503: FAILED assert(newhead >= log.tail) ceph version 0.94.5 (664cc0b54fdb496233a81ab19d42df3f46dcda50) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbd1ebb] 2: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x60b) [0x7840fb] 3: (PG::rewind_divergent_log(ObjectStore::Transaction&, eversion_t)+0x97) [0x7df4b7] 4: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) [0x80109f] 5: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x216) [0x83c8d6] 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b)
[0x827edb] 7: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7d5dce] 8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2c0) [0x6b5930] 9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x70ef18] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xbc2aa6] 11: (ThreadPool::WorkThread::entry()+0x10) [0xbc3b50] 12: (()+0x8182) [0x7f4fc6458182] 13: (clone()+0x6d) [0x7f4fc49c347d] osd 11 log: 2017-07-10 15:40:43.742547 7f8b1567a740 10 osd.11 95 pgid 1.20f coll 1.20f_head 2017-07-10 15:40:43.742580 7f8b1567a740 10 osd.11 95 _open_lock_pg 1.20f 2017-07-10 15:40:43.742592 7f8b1567a740 5 osd.11 pg_epoch: 95 pg[1.20f(unlocked)] enter Initial 2017-07-10 15:40:43.743207 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_loaded 2017-07-10 15:40:43.743214 7f8b1567a740 5 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] exit Initial 0.000622
0 0.000000 2017-07-10 15:40:43.743220 7f8b1567a740 5 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] enter Reset 2017-07-10 15:40:43.743224 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Clearing blocked
outgoing recovery messages 2017-07-10 15:40:43.743228 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Not blocking outgoing
recovery messages 2017-07-10 15:40:43.743232 7f8b1567a740 10 osd.11 95 load_pgs loaded pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] log((95'300,95'397],
crt=95'394) 2017-07-10 15:40:43.829867 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f(unlocked)] _calc_past_interval_range start epoch 93 >= end epoch 92, nothing to do 2017-07-10 15:42:40.899157 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] null 2017-07-10 15:42:40.902520 7f8afa1bd700 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_peering_event:
epoch_sent: 95 epoch_requested: 95 NullEvt 2017-07-10 15:42:40.917668 7f8b031cf700 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] null 2017-07-10 15:42:41.846755 7f8af89ba700 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_advance_map
[7,21]/[7,21] -- 7/7 2017-07-10 15:42:41.846771 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Reset>:
Reset advmap 2017-07-10 15:42:41.846781 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] _calc_past_interval_range
start epoch 1001 >= end epoch 92, nothing to do 2017-07-10 15:42:41.846790 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Reset>:
should restart peering, calling start_peering_interval again 2017-07-10 15:42:41.846798 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=1001 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Clearing blocked
outgoing recovery messages 2017-07-10 15:42:41.846807 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=1001 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Not blocking
outgoing recovery messages 2017-07-10 15:42:41.846829 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [7,21] r=-1 lpr=1001 pi=92-1000/1 crt=95'394 lcod 0'0 inactive] start_peering_interval:
check_new_interval output: generate_past_intervals interval(92-1000 up [11,7](11) acting [11,7](11)): not rw, up_thru 94 up_from 92 last_epoch_clean 93 2017-07-10 15:42:41.846839 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [7,21] r=-1 lpr=1001 pi=92-1000/1 crt=95'394 lcod 0'0 inactive] noting
past interval(92-1000 up [11,7](11) acting [11,7](11) maybe_went_rw) ... 2017-07-10 15:42:52.600376 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive]
enter Started 2017-07-10 15:42:52.600387 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive]
enter Start 2017-07-10 15:42:52.600397 7f8af91bb700 1 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive]
state<Start>: transitioning to Primary 2017-07-10 15:42:52.600410 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive]
exit Start 0.000023 0 0.000000 2017-07-10 15:42:52.600424 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive]
enter Started/Primary 2017-07-10 15:42:52.600435 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive]
enter Started/Primary/Peering 2017-07-10 15:42:52.600446 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
enter Started/Primary/Peering/GetInfo 2017-07-10 15:42:52.600460 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
_calc_past_interval_range: already have past intervals back to 93 2017-07-10 15:42:52.600474 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
PriorSet: build_prior interval(1664-1672 up [21](21) acting [21](21)) 2017-07-10 15:42:52.608241 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
PriorSet: build_prior interval(1001-1663 up [7,21](7) acting [7,21](7)) 2017-07-10 15:42:52.608256 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
PriorSet: build_prior interval(92-1000 up [11,7](11) acting [11,7](11) maybe_went_rw) 2017-07-10 15:42:52.608265 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
PriorSet: build_prior prior osd.7 is down 2017-07-10 15:42:52.608273 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
PriorSet: build_prior final: probe 11 down 7 blocked_by {} 2017-07-10 15:42:52.608281 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
up_thru 94 < same_since 1673, must notify monitor 2017-07-10 15:42:52.608294 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
exit Started/Primary/Peering/GetInfo 0.007847 0 0.000000 2017-07-10 15:42:52.608306 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
enter Started/Primary/Peering/GetLog 2017-07-10 15:42:52.608318 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
calc_acting osd.11 1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) 2017-07-10 15:42:52.608338 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
calc_acting newest update on osd.11 with 1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) calc_acting primary is osd.11 with 1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) 2017-07-10 15:42:52.608348 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
actingbackfill is 11 2017-07-10 15:42:52.608357 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
choose_acting want [11] (== acting) backfill_targets 2017-07-10 15:42:52.608367 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
state<Started/Primary/Peering/GetLog>: leaving GetLog 2017-07-10 15:42:52.608380 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
exit Started/Primary/Peering/GetLog 0.000073 0 0.000000 2017-07-10 15:42:52.608391 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
enter Started/Primary/Peering/GetMissing 2017-07-10 15:42:52.608400 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
state<Started/Primary/Peering/GetMissing>: still need up_thru update before going active 2017-07-10 15:42:52.608409 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
exit Started/Primary/Peering/GetMissing 0.000018 0 0.000000 2017-07-10 15:42:52.608417 7f8af91bb700 5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
enter Started/Primary/Peering/WaitUpThru 2017-07-10 15:42:52.608424 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 NullEvt 2017-07-10 15:42:52.610190 7f8b06334700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
flushed 2017-07-10 15:42:52.640813 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 MNotifyRec from 21 notify: (query_epoch:1673, epoch_sent:1673, info:1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 180/180 1673/1673/1673)) features: 0x483ffffffffffff 2017-07-10 15:42:52.640821 7f8af91bb700 7 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
state<Started/Primary>: handle_pg_notify from osd.21 2017-07-10 15:42:52.640829 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
got osd.21 1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 180/180 1673/1673/1673) 2017-07-10 15:42:52.640840 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
osd.21 has stray content: 1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 180/180 1673/1673/1673) 2017-07-10 15:42:52.640852 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
update_heartbeat_peers 11 -> 11,21 2017-07-10 15:42:52.642542 7f8af89ba700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 FlushedEvt 2017-07-10 15:42:53.549732 7f8b031cf700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
null 2017-07-10 15:42:53.556132 7f8af99bc700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
handle_advance_map [11]/[11] -- 11/11 2017-07-10 15:42:53.556142 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
state<Started/Primary/Peering>: Peering advmap 2017-07-10 15:42:53.556149 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
adjust_need_up_thru now 1673, need_up_thru now false 2017-07-10 15:42:53.556157 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
state<Started>: Started advmap 2017-07-10 15:42:53.556164 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
check_recovery_sources no source osds () went down 2017-07-10 15:42:53.556173 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
handle_activate_map 2017-07-10 15:42:53.556179 7f8af99bc700 7 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
state<Started/Primary>: handle ActMap primary 2017-07-10 15:42:53.556187 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
take_waiters 2017-07-10 15:42:53.556194 7f8af99bc700 5 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
exit Started/Primary/Peering/WaitUpThru 0.947776 5 0.008159 2017-07-10 15:42:53.556203 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
state<Started/Primary/Peering>: Leaving Peering 2017-07-10 15:42:53.556211 7f8af99bc700 5 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]
exit Started/Primary/Peering 0.955776 0 0.000000 2017-07-10 15:42:53.556221 7f8af99bc700 5 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive]
enter Started/Primary/Active 2017-07-10 15:42:53.556232 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive]
state<Started/Primary/Active>: In Active, about to call activate 2017-07-10 15:42:53.556241 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0
inactive] activate - snap_trimq [] 2017-07-10 15:42:53.556249 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0
inactive] activate - no missing, moving last_complete 95'397 -> 95'397 2017-07-10 15:42:53.556261 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0
inactive] 1.20f primary missing_loc.add_active_missing peer 11 missing: {} 2017-07-10 15:42:53.556268 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0
inactive] needs_recovery is recovered 2017-07-10 15:42:53.556275 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0
activating+undersized+degraded] state<Started/Primary/Active>: Activate Finished 2017-07-10 15:42:53.556283 7f8af99bc700 5 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0
activating+undersized+degraded] enter Started/Primary/Active/Activating 2017-07-10 15:42:53.556290 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0
activating+undersized+degraded] handle_peering_event: epoch_sent: 1674 epoch_requested: 1674 NullEvt 2017-07-10 15:42:53.564228 7f8b06334700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0
activating+undersized+degraded] _activate_committed 1674 peer_activated now 11 last_epoch_started 180 same_interval_since 1673 2017-07-10 15:42:53.564250 7f8b06334700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0
activating+undersized+degraded] all_activated_and_committed osd 7 log: 2017-07-10 16:08:29.812470 7f4fb425a700 7 osd.7 2123 handle_pg_info pg_info(1 pgs e2123:1.20f(6)) v4 from osd.11 2017-07-10 16:08:29.833669 7f4fac24a700 10 osd.7 pg_epoch: 2123 pg[1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 1898/1898 2122/2122/1673) [11,7] r=1 lpr=2122 pi=179-2121/15 crt=1663'652 lcod 0'0
inactive NOTIFY] handle_peering_event: epoch_sent: 2123 epoch_requested: 2123 MInfoRec from 11 info: 1.20f( v 95'397 (95'300,95'397] local-les=2123 n=69 ec=75 les/c 1898/1898 2122/2122/1673) 2017-07-10 16:08:29.833684 7f4fac24a700 10 osd.7 pg_epoch: 2123 pg[1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 1898/1898 2122/2122/1673) [11,7] r=1 lpr=2122 pi=179-2121/15 crt=1663'652 lcod 0'0
inactive NOTIFY] state<Started/Stray>: got info from osd.11 1.20f( v 95'397 (95'300,95'397] local-les=2123 n=69 ec=75 les/c 1898/1898 2122/2122/1673) 本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 邮件! This e-mail and its attachments contain confidential information from New H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com