Re: Help, monitor stuck constantly electing

kefu chai <tchaikov@xxxxxxxxx> · Tue, 17 May 2016 14:55:59 +0800

On Tue, May 17, 2016 at 1:36 AM, Василий Ангапов <angapov@xxxxxxxxx> wrote:
> Hello,
>
> I have a Ceph cluster (10.2.1) with 10 nodes, 3 mons and 290 OSDs. I
> have an instance of RGW with buckets data in EC pool 6+3.
> I've recently started testing cluster redundancy level by powering
> nodes off one by one.
> Suddenly I noticed that all monitors became crazy eating 100% CPU, in
> "perf top" it's like 80% ceph-mon [.] crush_hash32_3.
> "Ceph -s" is slowly working but the monmap election epoch is
> increasing contantly like 100 in a 5 minutes.
>
> So nothing really works as there is effectively no quorum. This
> already happened to me once I powered off 4 nodes of 10, the only
> thing that helped was removing all mons except one from monmap.
>
>     cluster 5ddb8aab-49b4-4a63-918e-33c569e3101e
>      health HEALTH_WARN
>             35 pgs backfill_wait
>             26126 pgs degraded
>             4 pgs recovering
>             2928 pgs recovery_wait
>             26130 pgs stuck unclean
>             26125 pgs undersized
>             recovery 127536/334221 objects degraded (38.159%)
>             recovery 139603/334221 objects misplaced (41.770%)
>             too many PGs per OSD (1325 > max 1000)
>      monmap e6: 3 mons at
> {ed-ds-c171=10.144.66.171:6789/0,ed-ds-c172=10.144.66.172:6789/0,ed-ds-c173=10.144.66.173:6789/0}
>             election epoch 1284, quorum 0,1,2 ed-ds-c171,ed-ds-c172,ed-ds-c173
>      osdmap e3950: 290 osds: 174 up, 174 in; 19439 remapped pgs
>             flags sortbitwise
>       pgmap v241407: 26760 pgs, 16 pools, 143 GB data, 37225 objects
>             258 GB used, 949 TB / 949 TB avail
>             127536/334221 objects degraded (38.159%)
>             139603/334221 objects misplaced (41.770%)
>                11972 active+undersized+degraded
>                11187 active+undersized+degraded+remapped
>                 2612 active+recovery_wait+undersized+degraded+remapped
>                  630 active+clean
>                  315 active+recovery_wait+undersized+degraded
>                   35 active+undersized+degraded+remapped+wait_backfill
>                    3 active+remapped
>                    3 active+recovering+undersized+degraded+remapped
>                    1 active
>                    1 active+recovery_wait+degraded
>                    1 active+recovering+undersized+degraded
>
> Logs from quorum leader with debug_mon=20
>
> 2016-05-16 17:34:41.318132 7f5079bb2700  5
> mon.ed-ds-c171@0(leader).elector(1310) handle_propose from mon.2

so mon.2 want to be elected when mon.0 is still the leader. but the reason why
mon.2 want to join the quorum or why it is not in the quorum is still unknown.

> 2016-05-16 17:34:41.318134 7f5079bb2700 10
> mon.ed-ds-c171@0(leader).elector(1310) handle_propose required
> features 9025616074506240, peer features 576460752032874495
> 2016-05-16 17:34:41.318136 7f5079bb2700 10
> mon.ed-ds-c171@0(leader).elector(1310) bump_epoch 1310 to 1311
> 2016-05-16 17:34:41.318345 7f5079bb2700 10 mon.ed-ds-c171@0(leader) e6
> join_election
> 2016-05-16 17:34:41.318352 7f5079bb2700 10 mon.ed-ds-c171@0(leader) e6 _reset
> 2016-05-16 17:34:41.318353 7f5079bb2700 10 mon.ed-ds-c171@0(leader) e6
> cancel_probe_timeout (none scheduled)
> 2016-05-16 17:34:41.318355 7f5079bb2700 10 mon.ed-ds-c171@0(leader) e6
> timecheck_finish
> 2016-05-16 17:34:41.318358 7f5079bb2700 15 mon.ed-ds-c171@0(leader) e6
> health_tick_stop
> 2016-05-16 17:34:41.318359 7f5079bb2700 15 mon.ed-ds-c171@0(leader) e6
> health_interval_stop
> 2016-05-16 17:34:41.318361 7f5079bb2700 10 mon.ed-ds-c171@0(leader) e6
> scrub_event_cancel
> 2016-05-16 17:34:41.318363 7f5079bb2700 10 mon.ed-ds-c171@0(leader) e6
> scrub_reset
> 2016-05-16 17:34:41.318368 7f5079bb2700 10 mon.ed-ds-c171@0(electing)
> e6 start_election
> 2016-05-16 17:34:41.318371 7f5079bb2700 10 mon.ed-ds-c171@0(electing) e6 _reset
> 2016-05-16 17:34:41.318372 7f5079bb2700 10 mon.ed-ds-c171@0(electing)
> e6 cancel_probe_timeout (none scheduled)
> 2016-05-16 17:34:41.318372 7f5079bb2700 10 mon.ed-ds-c171@0(electing)
> e6 timecheck_finish
> 2016-05-16 17:34:41.318373 7f5079bb2700 15 mon.ed-ds-c171@0(electing)
> e6 health_tick_stop
> 2016-05-16 17:34:41.318374 7f5079bb2700 15 mon.ed-ds-c171@0(electing)
> e6 health_interval_stop
> 2016-05-16 17:34:41.318375 7f5079bb2700 10 mon.ed-ds-c171@0(electing)
> e6 scrub_event_cancel
> 2016-05-16 17:34:41.318376 7f5079bb2700 10 mon.ed-ds-c171@0(electing)
> e6 scrub_reset
> 2016-05-16 17:34:41.318377 7f5079bb2700 10 mon.ed-ds-c171@0(electing)
> e6 cancel_probe_timeout (none scheduled)
> 2016-05-16 17:34:41.318380 7f5079bb2700  0 log_channel(cluster) log
> [INF] : mon.ed-ds-c171 calling new monitor election
> 2016-05-16 17:34:41.318403 7f5079bb2700  5
> mon.ed-ds-c171@0(electing).elector(1311) start -- can i be leader?
> 2016-05-16 17:34:41.318448 7f5079bb2700  1
> mon.ed-ds-c171@0(electing).elector(1311) init, last seen epoch 1311
> 2016-05-16 17:34:41.318677 7f5079bb2700 20 mon.ed-ds-c171@0(electing)
> e6 _ms_dispatch existing session 0x55b390116a80 for mon.1
> 10.144.66.172:6789/0
> 2016-05-16 17:34:41.318681 7f5079bb2700 20 mon.ed-ds-c171@0(electing)
> e6  caps allow *
> 2016-05-16 17:34:41.318686 7f5079bb2700 20 is_capable service=mon
> command= read on cap allow *
> 2016-05-16 17:34:41.318688 7f5079bb2700 20  allow so far , doing grant allow *
> 2016-05-16 17:34:41.318689 7f5079bb2700 20  allow all
> 2016-05-16 17:34:41.318690 7f5079bb2700 10 mon.ed-ds-c171@0(electing)
> e6 received forwarded message from mon.1 10.144.66.172:6789/0 via
> mon.1 10.144.66.172:6789/0
>
> Any help is appreciated!

Vasily, could you post the logs from all 3 monitors with
debug-paxos=20, debug-mon=20
and debug-ms=1 in 2 minutes or 3? so we can at least check 2 elections
from them?

thanks.

>
> Best regards, Vasily.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Regards
Kefu Chai
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com