calculate past_intervals wrong, lead to choose wrong authority osd, then osd assert(newhead >= log.tail)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sage

I find  the osd assert due to the  wrongly  generated past_intervals, could you give me some advice and solutions to this problem?

 

Here is the detail:

 

Ceph version: 0.94.5

 

HOST-A           HOST-B        HOST-C

osd 7                osd 21           osd11

1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up set[11,7],then shutdown HOST-C 

2. for a long time, cluster has only HOST A and HOST B, write data

3. shutdown HOST-A , then start HOST-C, restart HOST-B about 4 times

4. start HOST-A, osd 21 assert

 

Analysis:

when osd 11 start, it generate past_intervals wrongly, make [92~1000] in the same interval

pg map 1673,osd11 become the primary,and pg 1.20f change from peering to activating+undersized+degraded , modified last_epoch_start;

osd7 start, find_best_info will choose out bigger last_epoch_startalthought osd7 has the latest data;

past_intervals on osd 7:

~95         [11,7]/[11,7]

96~100        [7]/[7]

101         [7,21]/[7,21]

102~178     [7,21]/[7]

179~1663          [7,21]/[7,21]

1664~1672  [21]/[21]

1673~1692  [11]/[11]

 

past_intervals on osd11:

92~1000         [11,7]/[11,7]----------------the wrong pi

1001~1663       [7,21]/[7,21] no rw

1664~1672       [21]/[21]     no rw

1673~1692    [11]/[11]

 

 

 

Logs:

Assert on osd7:

2017-07-10 16:08:29.836722 7f4fac24a700 -1 osd/PGLog.cc: In function 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f4fac24a700 time 2017-07-10 16:08:29.833699

osd/PGLog.cc: 503: FAILED assert(newhead >= log.tail)

ceph version 0.94.5 (664cc0b54fdb496233a81ab19d42df3f46dcda50)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbd1ebb]

2: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x60b) [0x7840fb]

3: (PG::rewind_divergent_log(ObjectStore::Transaction&, eversion_t)+0x97) [0x7df4b7]

4: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) [0x80109f]

5: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x216) [0x83c8d6]

6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x827edb]

7: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7d5dce]

8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2c0) [0x6b5930]

9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x70ef18]

10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xbc2aa6]

11: (ThreadPool::WorkThread::entry()+0x10) [0xbc3b50]

12: (()+0x8182) [0x7f4fc6458182]

13: (clone()+0x6d) [0x7f4fc49c347d]

 

 

osd 11 log:

2017-07-10 15:40:43.742547 7f8b1567a740 10 osd.11 95 pgid 1.20f coll 1.20f_head

2017-07-10 15:40:43.742580 7f8b1567a740 10 osd.11 95 _open_lock_pg 1.20f

2017-07-10 15:40:43.742592 7f8b1567a740  5 osd.11 pg_epoch: 95 pg[1.20f(unlocked)] enter Initial

2017-07-10 15:40:43.743207 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_loaded

2017-07-10 15:40:43.743214 7f8b1567a740  5 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] exit Initial 0.000622 0 0.000000

2017-07-10 15:40:43.743220 7f8b1567a740  5 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] enter Reset

2017-07-10 15:40:43.743224 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Clearing blocked outgoing recovery messages

2017-07-10 15:40:43.743228 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Not blocking outgoing recovery messages

2017-07-10 15:40:43.743232 7f8b1567a740 10 osd.11 95 load_pgs loaded pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] log((95'300,95'397], crt=95'394)

2017-07-10 15:40:43.829867 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f(unlocked)] _calc_past_interval_range start epoch 93 >= end epoch 92, nothing to do

2017-07-10 15:42:40.899157 7f8b1567a740 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] null

2017-07-10 15:42:40.902520 7f8afa1bd700 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_peering_event: epoch_sent: 95 epoch_requested: 95 NullEvt

2017-07-10 15:42:40.917668 7f8b031cf700 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] null

2017-07-10 15:42:41.846755 7f8af89ba700 10 osd.11 pg_epoch: 95 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_advance_map [7,21]/[7,21] -- 7/7

2017-07-10 15:42:41.846771 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Reset>: Reset advmap

2017-07-10 15:42:41.846781 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] _calc_past_interval_range start epoch 1001 >= end epoch 92, nothing to do

2017-07-10 15:42:41.846790 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Reset>: should restart peering, calling start_peering_interval again

2017-07-10 15:42:41.846798 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=1001 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Clearing blocked outgoing recovery messages

2017-07-10 15:42:41.846807 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [11,7] r=0 lpr=1001 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Not blocking outgoing recovery messages

2017-07-10 15:42:41.846829 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [7,21] r=-1 lpr=1001 pi=92-1000/1 crt=95'394 lcod 0'0 inactive] start_peering_interval: check_new_interval output: generate_past_intervals interval(92-1000 up [11,7](11) acting [11,7](11)): not rw, up_thru 94 up_from 92 last_epoch_clean 93

2017-07-10 15:42:41.846839 7f8af89ba700 10 osd.11 pg_epoch: 1001 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) [7,21] r=-1 lpr=1001 pi=92-1000/1 crt=95'394 lcod 0'0 inactive]  noting past interval(92-1000 up [11,7](11) acting [11,7](11) maybe_went_rw)

...

2017-07-10 15:42:52.600376 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] enter Started

2017-07-10 15:42:52.600387 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] enter Start

2017-07-10 15:42:52.600397 7f8af91bb700  1 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Start>: transitioning to Primary

2017-07-10 15:42:52.600410 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] exit Start 0.000023 0 0.000000

2017-07-10 15:42:52.600424 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] enter Started/Primary

2017-07-10 15:42:52.600435 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] enter Started/Primary/Peering

2017-07-10 15:42:52.600446 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetInfo

2017-07-10 15:42:52.600460 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] _calc_past_interval_range: already have past intervals back to 93

2017-07-10 15:42:52.600474 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]  PriorSet: build_prior interval(1664-1672 up [21](21) acting [21](21))

2017-07-10 15:42:52.608241 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]  PriorSet: build_prior interval(1001-1663 up [7,21](7) acting [7,21](7))

2017-07-10 15:42:52.608256 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]  PriorSet: build_prior interval(92-1000 up [11,7](11) acting [11,7](11) maybe_went_rw)

2017-07-10 15:42:52.608265 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]  PriorSet: build_prior  prior osd.7 is down

2017-07-10 15:42:52.608273 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]  PriorSet: build_prior final: probe 11 down 7 blocked_by {}

2017-07-10 15:42:52.608281 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] up_thru 94 < same_since 1673, must notify monitor

2017-07-10 15:42:52.608294 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] exit Started/Primary/Peering/GetInfo 0.007847 0 0.000000

2017-07-10 15:42:52.608306 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog

2017-07-10 15:42:52.608318 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] calc_acting osd.11 1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673)

2017-07-10 15:42:52.608338 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] calc_acting newest update on osd.11 with 1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673)

calc_acting primary is osd.11 with 1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673)

2017-07-10 15:42:52.608348 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] actingbackfill is 11

2017-07-10 15:42:52.608357 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] choose_acting want [11] (== acting) backfill_targets

2017-07-10 15:42:52.608367 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetLog>: leaving GetLog

2017-07-10 15:42:52.608380 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] exit Started/Primary/Peering/GetLog 0.000073 0 0.000000

2017-07-10 15:42:52.608391 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetMissing

2017-07-10 15:42:52.608400 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetMissing>:  still need up_thru update before going active

2017-07-10 15:42:52.608409 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] exit Started/Primary/Peering/GetMissing 0.000018 0 0.000000

2017-07-10 15:42:52.608417 7f8af91bb700  5 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] enter Started/Primary/Peering/WaitUpThru

2017-07-10 15:42:52.608424 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 NullEvt

2017-07-10 15:42:52.610190 7f8b06334700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] flushed

2017-07-10 15:42:52.640813 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 MNotifyRec from 21 notify: (query_epoch:1673, epoch_sent:1673, info:1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 180/180 1673/1673/1673)) features: 0x483ffffffffffff

2017-07-10 15:42:52.640821 7f8af91bb700  7 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] state<Started/Primary>: handle_pg_notify from osd.21

2017-07-10 15:42:52.640829 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]  got osd.21 1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 180/180 1673/1673/1673)

2017-07-10 15:42:52.640840 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering]  osd.21 has stray content: 1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 180/180 1673/1673/1673)

2017-07-10 15:42:52.640852 7f8af91bb700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] update_heartbeat_peers 11 -> 11,21

2017-07-10 15:42:52.642542 7f8af89ba700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 FlushedEvt

2017-07-10 15:42:53.549732 7f8b031cf700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] null

2017-07-10 15:42:53.556132 7f8af99bc700 10 osd.11 pg_epoch: 1673 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] handle_advance_map [11]/[11] -- 11/11

2017-07-10 15:42:53.556142 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering>: Peering advmap

2017-07-10 15:42:53.556149 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] adjust_need_up_thru now 1673, need_up_thru now false

2017-07-10 15:42:53.556157 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] state<Started>: Started advmap

2017-07-10 15:42:53.556164 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] check_recovery_sources no source osds () went down

2017-07-10 15:42:53.556173 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] handle_activate_map

2017-07-10 15:42:53.556179 7f8af99bc700  7 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] state<Started/Primary>: handle ActMap primary

2017-07-10 15:42:53.556187 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] take_waiters

2017-07-10 15:42:53.556194 7f8af99bc700  5 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] exit Started/Primary/Peering/WaitUpThru 0.947776 5 0.008159

2017-07-10 15:42:53.556203 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering>: Leaving Peering

2017-07-10 15:42:53.556211 7f8af99bc700  5 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 peering] exit Started/Primary/Peering 0.955776 0 0.000000

2017-07-10 15:42:53.556221 7f8af99bc700  5 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] enter Started/Primary/Active

2017-07-10 15:42:53.556232 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Started/Primary/Active>: In Active, about to call activate

2017-07-10 15:42:53.556241 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] activate - snap_trimq []

2017-07-10 15:42:53.556249 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] activate - no missing, moving last_complete 95'397 -> 95'397

2017-07-10 15:42:53.556261 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] 1.20f primary missing_loc.add_active_missing peer 11 missing: {}

2017-07-10 15:42:53.556268 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 inactive] needs_recovery is recovered

2017-07-10 15:42:53.556275 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 activating+undersized+degraded] state<Started/Primary/Active>: Activate Finished

2017-07-10 15:42:53.556283 7f8af99bc700  5 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 activating+undersized+degraded] enter Started/Primary/Active/Activating

2017-07-10 15:42:53.556290 7f8af99bc700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 activating+undersized+degraded] handle_peering_event: epoch_sent: 1674 epoch_requested: 1674 NullEvt

2017-07-10 15:42:53.564228 7f8b06334700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 activating+undersized+degraded] _activate_committed 1674 peer_activated now 11 last_epoch_started 180 same_interval_since 1673

2017-07-10 15:42:53.564250 7f8b06334700 10 osd.11 pg_epoch: 1674 pg[1.20f( v 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 mlcod 0'0 activating+undersized+degraded] all_activated_and_committed

 

osd 7 log:

2017-07-10 16:08:29.812470 7f4fb425a700  7 osd.7 2123 handle_pg_info pg_info(1 pgs e2123:1.20f(6)) v4 from osd.11

2017-07-10 16:08:29.833669 7f4fac24a700 10 osd.7 pg_epoch: 2123 pg[1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 1898/1898 2122/2122/1673) [11,7] r=1 lpr=2122 pi=179-2121/15 crt=1663'652 lcod 0'0 inactive NOTIFY] handle_peering_event: epoch_sent: 2123 epoch_requested: 2123 MInfoRec from 11 info: 1.20f( v 95'397 (95'300,95'397] local-les=2123 n=69 ec=75 les/c 1898/1898 2122/2122/1673)

2017-07-10 16:08:29.833684 7f4fac24a700 10 osd.7 pg_epoch: 2123 pg[1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 1898/1898 2122/2122/1673) [11,7] r=1 lpr=2122 pi=179-2121/15 crt=1663'652 lcod 0'0 inactive NOTIFY] state<Started/Stray>: got info from osd.11 1.20f( v 95'397 (95'300,95'397] local-les=2123 n=69 ec=75 les/c 1898/1898 2122/2122/1673)

-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux