Hi, everyone. Recently, when I was doing some stress test, one of the monitors of my ceph cluster was marked down, and all the monitors repeatedly call new election and the I/O can be finished. There were three monitors in my cluster:
rg3-ceph36, rg3-ceph40, rg3-ceph45. It was rg3-ceph40 that was always marked down, but it was running. Here is rg3-ceph40’s monitor log with debug_mon and debug_paxos set to 20/0. 2017-02-03 16:33:25.738290 7f4f301ca700 5 mon.rg3-ceph40@1(electing).elector(7009) handle_ack from mon.2 2017-02-03 16:33:25.738294 7f4f301ca700 5 mon.rg3-ceph40@1(electing).elector(7009) so far i have {1=37154696925806591,2=37154696925806591} 2017-02-03 16:33:28.033563 7f4f309cb700 11 mon.rg3-ceph40@1(electing) e1 tick 2017-02-03 16:33:28.033584 7f4f309cb700 20 mon.rg3-ceph40@1(electing) e1 sync_trim_providers 2017-02-03 16:33:30.737928 7f4f309cb700 5 mon.rg3-ceph40@1(electing).elector(7009) election timer expired 2017-02-03 16:33:30.737953 7f4f309cb700 10 mon.rg3-ceph40@1(electing).elector(7009) bump_epoch 7009 to 7010 2017-02-03 16:33:30.740784 7f4f309cb700 10 mon.rg3-ceph40@1(electing) e1 join_election 2017-02-03 16:33:30.740802 7f4f309cb700 10 mon.rg3-ceph40@1(electing) e1 _reset 2017-02-03 16:33:30.740805 7f4f309cb700 10 mon.rg3-ceph40@1(electing) e1 cancel_probe_timeout (none scheduled) 2017-02-03 16:33:30.740807 7f4f309cb700 10 mon.rg3-ceph40@1(electing) e1 timecheck_finish 2017-02-03 16:33:30.740810 7f4f309cb700 15 mon.rg3-ceph40@1(electing) e1 health_tick_stop 2017-02-03 16:33:30.740812 7f4f309cb700 15 mon.rg3-ceph40@1(electing) e1 health_interval_stop 2017-02-03 16:33:30.740814 7f4f309cb700 10 mon.rg3-ceph40@1(electing) e1 scrub_reset 2017-02-03 16:33:30.740816 7f4f309cb700 10 mon.rg3-ceph40@1(electing).paxos(paxos recovering c 200550..201284) restart -- canceling timeouts 2017-02-03 16:33:30.740823 7f4f309cb700 10 mon.rg3-ceph40@1(electing).paxosservice(pgmap 85794..86329) restart 2017-02-03 16:33:30.740827 7f4f309cb700 10 mon.rg3-ceph40@1(electing).paxosservice(mdsmap 1..1) restart 2017-02-03 16:33:30.740830 7f4f309cb700 10 mon.rg3-ceph40@1(electing).paxosservice(osdmap 25125..25724) restart 2017-02-03 16:33:30.740832 7f4f309cb700 10 mon.rg3-ceph40@1(electing).paxosservice(logm 98223..98787) restart 2017-02-03 16:33:30.740834 7f4f309cb700 10 mon.rg3-ceph40@1(electing).paxosservice(monmap 1..1) restart 2017-02-03 16:33:30.740836 7f4f309cb700 10 mon.rg3-ceph40@1(electing).paxosservice(auth 501..623) restart 2017-02-03 16:33:30.740872 7f4f309cb700 10 mon.rg3-ceph40@1(electing) e1 win_election epoch 7010 quorum 1,2 features 37154696925806591 2017-02-03 16:33:30.740889 7f4f309cb700 0 log_channel(cluster) log [INF] : mon.rg3-ceph40@1 won leader election with quorum 1,2 2017-02-03 16:33:30.741119 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxos(paxos recovering c 200550..201284) leader_init -- starting paxos recovery 2017-02-03 16:33:30.742301 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxos(paxos recovering c 200550..201284) get_new_proposal_number = 350501 2017-02-03 16:33:30.742315 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxos(paxos recovering c 200550..201284) collect with pn 350501 2017-02-03 16:33:30.742328 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(monmap 1..1) election_finished 2017-02-03 16:33:30.742332 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(monmap 1..1) _active - not active 2017-02-03 16:33:30.742334 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(pgmap 85794..86329) election_finished 2017-02-03 16:33:30.742336 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(pgmap 85794..86329) _active - not active 2017-02-03 16:33:30.742338 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(mdsmap 1..1) election_finished 2017-02-03 16:33:30.742340 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(mdsmap 1..1) _active - not active 2017-02-03 16:33:30.742341 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(osdmap 25125..25724) election_finished 2017-02-03 16:33:30.742343 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(osdmap 25125..25724) _active - not active 2017-02-03 16:33:30.742345 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(logm 98223..98787) election_finished 2017-02-03 16:33:30.742346 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(logm 98223..98787) _active - not active 2017-02-03 16:33:30.742348 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(auth 501..623) election_finished 2017-02-03 16:33:30.742350 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(auth 501..623) _active - not active 2017-02-03 16:33:30.742352 7f4f309cb700 10 mon.rg3-ceph40@1(leader).data_health(7010) start_epoch epoch 7010 2017-02-03 16:33:30.742361 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 timecheck_finish 2017-02-03 16:33:30.742363 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 resend_routed_requests 2017-02-03 16:33:30.742377 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 requeue for self tid 3383 log(1 entries) v1 2017-02-03 16:33:30.742383 7f4f309cb700 20 mon.rg3-ceph40@1(leader) e1 have connection 2017-02-03 16:33:30.742386 7f4f309cb700 20 mon.rg3-ceph40@1(leader) e1 ms_dispatch existing session MonSession: mon.1 10.205.198.85:6789/0 is openallow * for mon.1 10.205.198.85:6789/0 2017-02-03 16:33:30.742396 7f4f309cb700 20 mon.rg3-ceph40@1(leader) e1 caps allow * 2017-02-03 16:33:30.742400 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(logm 98223..98787) dispatch log(1 entries) v1 from mon.1 10.205.198.85:6789/0 2017-02-03 16:33:30.742407 7f4f309cb700 5 mon.rg3-ceph40@1(leader).paxos(paxos recovering c 200550..201284) is_readable = 0 - now=2017-02-03 16:33:30.742408 lease_expire=0.000000 has v0 lc 201284 2017-02-03 16:33:30.742420 7f4f309cb700 10 mon.rg3-ceph40@1(leader).paxosservice(logm 98223..98787) waiting for paxos -> readable (v0) 2017-02-03 16:33:30.742423 7f4f309cb700 5 mon.rg3-ceph40@1(leader).paxos(paxos recovering c 200550..201284) is_readable = 0 - now=2017-02-03 16:33:30.742424 lease_expire=0.000000 has v0 lc 201284 2017-02-03 16:33:30.742432 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 register_cluster_logger 2017-02-03 16:33:30.742438 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 timecheck_start 2017-02-03 16:33:30.742440 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 timecheck_start_round curr 0 2017-02-03 16:33:30.742442 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 timecheck_start_round new 1 2017-02-03 16:33:30.742444 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 timecheck 2017-02-03 16:33:30.742445 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 timecheck start timecheck epoch 7010 round 1 2017-02-03 16:33:30.742452 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 timecheck send time_check( ping e 7010 r 1 ) v1 to mon.2 10.205.198.149:6789/0 2017-02-03 16:33:30.742462 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 timecheck_start_round setting up next event 2017-02-03 16:33:30.742467 7f4f309cb700 15 mon.rg3-ceph40@1(leader) e1 health_tick_start 2017-02-03 16:33:30.742469 7f4f309cb700 15 mon.rg3-ceph40@1(leader) e1 health_tick_stop 2017-02-03 16:33:30.742472 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 do_health_to_clog_interval 2017-02-03 16:33:30.742474 7f4f309cb700 10 mon.rg3-ceph40@1(leader) e1 do_health_to_clog (force) 2017-02-03 16:33:30.743637 7f4f309cb700 10 mon.rg3-ceph40@1(leader).data_health(7010) get_health 2017-02-03 16:33:30.743661 7f4f309cb700 0 log_channel(cluster) log [INF] : HEALTH_WARN; 1 mons down, quorum 1,2 rg3-ceph40,rg3-ceph45 2017-02-03 16:33:30.743701 7f4f309cb700 15 mon.rg3-ceph40@1(leader) e1 health_interval_start 2017-02-03 16:33:30.743704 7f4f309cb700 15 mon.rg3-ceph40@1(leader) e1 health_interval_stop 2017-02-03 16:33:30.743705 7f4f309cb700 20 mon.rg3-ceph40@1(leader) e1 health_interval_calc_next_update now: 2017-02-03 16:33:30.743705, next: 2017-02-03 17:00:00.000000, interval: 3600 2017-02-03 16:33:30.743724 7f4f301ca700 20 mon.rg3-ceph40@1(leader) e1 have connection 2017-02-03 16:33:30.743732 7f4f301ca700 20 mon.rg3-ceph40@1(leader) e1 ms_dispatch existing session MonSession: mon.1 10.205.198.85:6789/0 is openallow * for mon.1 10.205.198.85:6789/0 2017-02-03 16:33:30.743744 7f4f301ca700 20 mon.rg3-ceph40@1(leader) e1 caps allow * 2017-02-03 16:33:30.743747 7f4f301ca700 10 mon.rg3-ceph40@1(leader).paxosservice(logm 98223..98787) dispatch log(1 entries) v1 from mon.1 10.205.198.85:6789/0 2017-02-03 16:33:30.743756 7f4f301ca700 5 mon.rg3-ceph40@1(leader).paxos(paxos recovering c 200550..201284) is_readable = 0 - now=2017-02-03 16:33:30.743756 lease_expire=0.000000 has v0 lc 201284 2017-02-03 16:33:30.743766 7f4f301ca700 10 mon.rg3-ceph40@1(leader).paxosservice(logm 98223..98787) waiting for paxos -> readable (v0) 2017-02-03 16:33:30.743770 7f4f301ca700 5 mon.rg3-ceph40@1(leader).paxos(paxos recovering c 200550..201284) is_readable = 0 - now=2017-02-03 16:33:30.743775 lease_expire=0.000000 has v0 lc 201284 2017-02-03 16:33:30.743786 7f4f301ca700 20 mon.rg3-ceph40@1(leader) e1 have connection 2017-02-03 16:33:30.743789 7f4f301ca700 20 mon.rg3-ceph40@1(leader) e1 ms_dispatch existing session MonSession: mon.1 10.205.198.85:6789/0 is openallow * for mon.1 10.205.198.85:6789/0 2017-02-03 16:33:30.743796 7f4f301ca700 20 mon.rg3-ceph40@1(leader) e1 caps allow * 2017-02-03 16:33:30.743798 7f4f301ca700 10 mon.rg3-ceph40@1(leader).paxosservice(logm 98223..98787) dispatch log(1 entries) v1 from mon.1 10.205.198.85:6789/0 2017-02-03 16:33:30.743803 7f4f301ca700 5 mon.rg3-ceph40@1(leader).paxos(paxos recovering c 200550..201284) is_readable = 0 - now=2017-02-03 16:33:30.743804 lease_expire=0.000000 has v0 lc 201284 2017-02-03 16:33:30.743809 7f4f301ca700 10 mon.rg3-ceph40@1(leader).paxosservice(logm 98223..98787) waiting for paxos -> readable (v0) 2017-02-03 16:33:30.743812 7f4f301ca700 5 mon.rg3-ceph40@1(leader).paxos(paxos recovering c 200550..201284) is_readable = 0 - now=2017-02-03 16:33:30.743814 lease_expire=0.000000 has v0 lc 201284 2017-02-03 16:33:30.789898 7f4f301ca700 20 mon.rg3-ceph40@1(leader) e1 have connection 2017-02-03 16:33:30.789910 7f4f301ca700 20 mon.rg3-ceph40@1(leader) e1 ms_dispatch existing session MonSession: mon.2 10.205.198.149:6789/0 is openallow * for mon.2 10.205.198.149:6789/0 I turned down rg3-ceph40 by /etc/init.d/ceph stop mon, the other two monitors stopped calling new election, however, the I/O were still stuck. The following the monitor log of rg3-ceph36: 2017-02-03 17:03:02.279230 7fe0be819700 10 mon.rg3-ceph36@0(leader).pg v86564 encode_pending v 86565 2017-02-03 17:03:02.337531 7fe0be819700 10 mon.rg3-ceph36@0(leader).log v99039 encode_full log v 99039 2017-02-03 17:03:02.337612 7fe0be819700 10 mon.rg3-ceph36@0(leader).log v99039 encode_pending v99040 2017-02-03 17:03:02.354173 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader) e1 refresh_from_paxos 2017-02-03 17:03:02.354261 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86564 update_from_paxos read_incremental 2017-02-03 17:03:02.354766 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 read_pgmap_meta 2017-02-03 17:03:02.354874 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 map_pg_creates to 0 pgs -- no change 2017-02-03 17:03:02.354881 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 send_pg_creates to 0 pgs 2017-02-03 17:03:02.354885 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 update_logger 2017-02-03 17:03:02.355051 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99039 update_from_paxos 2017-02-03 17:03:02.355061 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99039 update_from_paxos version 99039 summary v 99039 2017-02-03 17:03:02.355180 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).auth v625 update_from_paxos 2017-02-03 17:03:02.355188 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 map_pg_creates to 0 pgs -- no change 2017-02-03 17:03:02.355190 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 send_pg_creates to 0 pgs 2017-02-03 17:03:02.355234 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 create_pending v 86566 2017-02-03 17:03:02.355238 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.67 10.205.198.148:6812/99322 2017-02-03 17:03:02.355259 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.12 10.205.198.147:6802/96650 2017-02-03 17:03:02.355272 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.80 10.205.198.147:6816/99711 2017-02-03 17:03:02.355283 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.72 10.205.198.82:6804/2609 2017-02-03 17:03:02.355292 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.17 10.205.198.82:6802/2275 2017-02-03 17:03:02.355302 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.22 10.205.198.149:6802/11553 2017-02-03 17:03:02.355324 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.83 10.205.198.145:6816/83116 2017-02-03 17:03:02.355335 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.86 10.205.198.146:6806/92691 2017-02-03 17:03:02.355362 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.71 10.205.198.82:6808/3461 2017-02-03 17:03:02.355377 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.32 10.205.198.81:6806/48193 2017-02-03 17:03:02.355395 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.30 10.205.198.83:6812/1188 2017-02-03 17:03:02.355418 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.65 10.205.198.145:6810/81841 2017-02-03 17:03:02.355448 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.33 10.205.198.149:6808/12810 2017-02-03 17:03:02.355466 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.34 10.205.198.149:6816/14873 2017-02-03 17:03:02.355477 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.3 10.205.198.81:6802/47424 2017-02-03 17:03:02.355488 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.20 10.205.198.83:6806/101823 2017-02-03 17:03:02.355498 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.55 10.205.198.148:6816/100258 2017-02-03 17:03:02.355509 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.9 10.205.198.145:6806/80779 2017-02-03 17:03:02.355519 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.84 10.205.198.146:6804/92333 2017-02-03 17:03:02.355528 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.54 10.205.198.145:6814/82758 2017-02-03 17:03:02.355538 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.87 10.205.198.146:6812/94054 2017-02-03 17:03:02.355563 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).pg v86565 _updated_stats for osd.68 10.205.198.148:6802/97249 2017-02-03 17:03:02.355603 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 check_osd_map already seen 25728 >= 25728 2017-02-03 17:03:02.355611 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 update_logger 2017-02-03 17:03:02.355635 7fe0bfc1f700 0 log_channel(cluster) log [INF] : pgmap v86565: 4096 pgs: 4096 active+clean; 296 MB data, 14135 MB used, 290 TB / 306 TB avail; 5268 B/s rd, 6 op/s 2017-02-03 17:03:02.379079 7fe0be018700 10 mon.rg3-ceph36@0(leader).log v99039 preprocess_query log(1 entries) v1 from mon.0 10.205.198.81:6789/0 2017-02-03 17:03:02.379089 7fe0be018700 10 mon.rg3-ceph36@0(leader).log v99039 preprocess_log log(1 entries) v1 from mon.0 2017-02-03 17:03:02.412442 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader) e1 refresh_from_paxos 2017-02-03 17:03:02.412681 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 update_from_paxos 2017-02-03 17:03:02.412689 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 update_from_paxos version 99040 summary v 99039 2017-02-03 17:03:02.412709 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 update_from_paxos latest full 99039 2017-02-03 17:03:02.412743 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).log v99040 update_from_paxos applying incremental log 99040 2017-02-03 17:03:01.280909 mon.0 10.205.198.81:6789/0 13620 : cluster [INF] pgmap v86564:
4096 pgs: 4096 active+clean; 296 MB data, 13947 MB used, 290 TB / 306 TB avail 2017-02-03 17:03:02.412822 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 check_subs 2017-02-03 17:03:02.412962 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).auth v625 update_from_paxos 2017-02-03 17:03:02.412969 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 map_pg_creates to 0 pgs -- no change 2017-02-03 17:03:02.412972 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 send_pg_creates to 0 pgs 2017-02-03 17:03:02.413011 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 create_pending v 99041 2017-02-03 17:03:02.413015 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).log v99040 _updated_log for mon.0 10.205.198.81:6789/0 2017-02-03 17:03:02.413042 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 preprocess_query log(1 entries) v1 from mon.0 10.205.198.81:6789/0 2017-02-03 17:03:02.413048 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 preprocess_log log(1 entries) v1 from mon.0 2017-02-03 17:03:02.413055 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 prepare_update log(1 entries) v1 from mon.0 10.205.198.81:6789/0 2017-02-03 17:03:02.413059 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 prepare_log log(1 entries) v1 from mon.0 2017-02-03 17:03:02.413063 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99040 logging 2017-02-03 17:03:02.355639 mon.0 10.205.198.81:6789/0 13621 : cluster [INF] pgmap v86565: 4096 pgs: 4096 active+clean; 296 MB data,
14135 MB used, 290 TB / 306 TB avail; 5268 B/s rd, 6 op/s 2017-02-03 17:03:03.412444 7fe0be819700 10 mon.rg3-ceph36@0(leader).log v99040 encode_full log v 99040 2017-02-03 17:03:03.412548 7fe0be819700 10 mon.rg3-ceph36@0(leader).log v99040 encode_pending v99041 2017-02-03 17:03:03.479045 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader) e1 refresh_from_paxos 2017-02-03 17:03:03.479292 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99041 update_from_paxos 2017-02-03 17:03:03.479299 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99041 update_from_paxos version 99041 summary v 99040 2017-02-03 17:03:03.479320 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99041 update_from_paxos latest full 99040 2017-02-03 17:03:03.479348 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).log v99041 update_from_paxos applying incremental log 99041 2017-02-03 17:03:02.355639 mon.0 10.205.198.81:6789/0 13621 : cluster [INF] pgmap v86565:
4096 pgs: 4096 active+clean; 296 MB data, 14135 MB used, 290 TB / 306 TB avail; 5268 B/s rd, 6 op/s 2017-02-03 17:03:03.479423 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99041 check_subs 2017-02-03 17:03:03.479572 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).auth v625 update_from_paxos 2017-02-03 17:03:03.479581 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 map_pg_creates to 0 pgs -- no change 2017-02-03 17:03:03.479584 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).pg v86565 send_pg_creates to 0 pgs 2017-02-03 17:03:03.479631 7fe0bfc1f700 10 mon.rg3-ceph36@0(leader).log v99041 create_pending v 99042 2017-02-03 17:03:03.479638 7fe0bfc1f700 7 mon.rg3-ceph36@0(leader).log v99041 _updated_log for mon.0 10.205.198.81:6789/0 2017-02-03 17:03:04.489266 7fe0be819700 10 mon.rg3-ceph36@0(leader).pg v86565 check_down_pgs 2017-02-03 17:03:04.489440 7fe0be819700 10 mon.rg3-ceph36@0(leader).pg v86565 v86565: 4096 pgs: 4096 active+clean; 296 MB data, 14135 MB used, 290 TB / 306 TB avail; 5268 B/s rd, 6 op/s 2017-02-03 17:03:04.489473 7fe0be819700 10 mon.rg3-ceph36@0(leader).osd e25728 e25728: 90 osds: 90 up, 90 in 2017-02-03 17:03:04.489530 7fe0be819700 10 mon.rg3-ceph36@0(leader).osd e25728 min_last_epoch_clean 25728 2017-02-03 17:03:04.489533 7fe0be819700 10 mon.rg3-ceph36@0(leader).log v99041 log 2017-02-03 17:03:04.489538 7fe0be819700 10 mon.rg3-ceph36@0(leader).auth v625 auth 2017-02-03 17:03:05.814732 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 handle_subscribe mon_subscribe({monmap=2+,osd_pg_creates=0}) v2 2017-02-03 17:03:05.814760 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 check_sub monmap next 2 have 1 2017-02-03 17:03:08.567850 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 handle_subscribe mon_subscribe({monmap=2+,osd_pg_creates=0}) v2 2017-02-03 17:03:08.567880 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 check_sub monmap next 2 have 1 2017-02-03 17:03:09.489665 7fe0be819700 10 mon.rg3-ceph36@0(leader).pg v86565 v86565: 4096 pgs: 4096 active+clean; 296 MB data, 14135 MB used, 290 TB / 306 TB avail; 5268 B/s rd, 6 op/s 2017-02-03 17:03:09.489708 7fe0be819700 10 mon.rg3-ceph36@0(leader).osd e25728 e25728: 90 osds: 90 up, 90 in 2017-02-03 17:03:09.489750 7fe0be819700 10 mon.rg3-ceph36@0(leader).osd e25728 min_last_epoch_clean 25728 2017-02-03 17:03:09.489753 7fe0be819700 10 mon.rg3-ceph36@0(leader).log v99041 log 2017-02-03 17:03:09.489758 7fe0be819700 10 mon.rg3-ceph36@0(leader).auth v625 auth 2017-02-03 17:03:10.940004 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 received forwarded message from osd.29 10.205.198.82:6812/4405 via mon.2 10.205.198.149:6789/0 2017-02-03 17:03:10.940036 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 caps are allow * 2017-02-03 17:03:10.940040 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 entity name 'osd.29' type 4 2017-02-03 17:03:10.940042 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 mesg 0x7fe0bb1a4c00 from 10.205.198.149:6789/0 2017-02-03 17:03:10.940065 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 preprocess_query pg_stats(0 pgs tid 16094 v 0) v1 from osd.29 10.205.198.82:6812/4405 2017-02-03 17:03:10.940082 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 prepare_update pg_stats(0 pgs tid 16094 v 0) v1 from osd.29 10.205.198.82:6812/4405 2017-02-03 17:03:10.940088 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 prepare_pg_stats pg_stats(0 pgs tid 16094 v 0) v1 from osd.29 2017-02-03 17:03:10.940101 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 message contains no new osd|pg stats 2017-02-03 17:03:10.986247 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 handle_subscribe mon_subscribe({monmap=2+,osd_pg_creates=0}) v2 2017-02-03 17:03:10.986265 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 check_sub monmap next 2 have 1 2017-02-03 17:03:11.344862 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 received forwarded message from osd.77 10.205.198.145:6812/82199 via mon.2 10.205.198.149:6789/0 2017-02-03 17:03:11.344881 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 caps are allow * 2017-02-03 17:03:11.344885 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 entity name 'osd.77' type 4 2017-02-03 17:03:11.344887 7fe0be018700 10 mon.rg3-ceph36@0(leader) e1 mesg 0x7fe0bb1a5b00 from 10.205.198.149:6789/0 2017-02-03 17:03:11.344904 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 preprocess_query pg_stats(0 pgs tid 16272 v 0) v1 from osd.77 10.205.198.145:6812/82199 2017-02-03 17:03:11.344926 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 prepare_update pg_stats(0 pgs tid 16272 v 0) v1 from osd.77 10.205.198.145:6812/82199 2017-02-03 17:03:11.344931 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 prepare_pg_stats pg_stats(0 pgs tid 16272 v 0) v1 from osd.77 2017-02-03 17:03:11.344941 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 message contains no new osd|pg stats 2017-02-03 17:03:11.833360 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 preprocess_query pg_stats(0 pgs tid 17368 v 0) v1 from osd.49 10.205.198.85:6812/13922 2017-02-03 17:03:11.833385 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 prepare_update pg_stats(0 pgs tid 17368 v 0) v1 from osd.49 10.205.198.85:6812/13922 2017-02-03 17:03:11.833390 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 prepare_pg_stats pg_stats(0 pgs tid 17368 v 0) v1 from osd.49 2017-02-03 17:03:11.833398 7fe0be018700 10 mon.rg3-ceph36@0(leader).pg v86565 message contains no new osd|pg stats I’m using hammer version, 0.94.5. Please help me, thank you. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com