Re: Monitors stuck in "electing"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 25, 2014 at 9:24 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
> Okay, last one until I get some guidance.  Sorry for the spam, but wanted to
> paint a full picture.  Here are debug logs from all three mons, capturing
> what looks like an election sequence to me:
>
> ceph0:
> 2014-03-25 16:17:24.324846 7fa5c53fc700  5 mon.ceph0@0(electing).elector(35)
> start -- can i be leader?
> 2014-03-25 16:17:24.324900 7fa5c53fc700  1 mon.ceph0@0(electing).elector(35)
> init, last seen epoch 35
> 2014-03-25 16:17:24.324913 7fa5c53fc700  1 -- 10.10.30.0:6789/0 --> mon.1
> 10.10.30.1:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose
> 35) v4 -- ?+0 0x263d480
> 2014-03-25 16:17:24.324948 7fa5c53fc700  1 -- 10.10.30.0:6789/0 --> mon.2
> 10.10.30.2:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose
> 35) v4 -- ?+0 0x263d6c0
> 2014-03-25 16:17:25.353975 7fa5c4bfb700  1 -- 10.10.30.0:6789/0 <== mon.2
> 10.10.30.2:6789/0 493 ==== election(b3f38955-4321-4850-9ddb-3b09940dc951
> propose 35) v4 ==== 537+0+0 (4036841703 0 0) 0x265fd80 con 0x1df0c60
> 2014-03-25 16:17:25.354042 7fa5c4bfb700  5 mon.ceph0@0(electing).elector(35)
> handle_propose from mon.2
> 2014-03-25 16:17:29.325107 7fa5c53fc700  5 mon.ceph0@0(electing).elector(35)
> election timer expired
>
> ceph1:
> 2014-03-25 16:17:24.325529 7ffe48cc1700  5 mon.ceph1@1(electing).elector(35)
> handle_propose from mon.0
> 2014-03-25 16:17:24.325535 7ffe48cc1700  5 mon.ceph1@1(electing).elector(35)
> defer to 0
> 2014-03-25 16:17:24.325546 7ffe48cc1700  1 -- 10.10.30.1:6789/0 --> mon.0
> 10.10.30.0:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 ack 35)
> v4 -- ?+0 0x1bbfb40
> 2014-03-25 16:17:25.354038 7ffe48cc1700  1 -- 10.10.30.1:6789/0 <== mon.2
> 10.10.30.2:6789/0 489 ==== election(b3f38955-4321-4850-9ddb-3b09940dc951
> propose 35) v4 ==== 537+0+0 (4036841703 0 0) 0x1bbf6c0 con 0x14d9b00
> 2014-03-25 16:17:25.354102 7ffe48cc1700  5 mon.ceph1@1(electing).elector(35)
> handle_propose from mon.2
> 2014-03-25 16:17:25.354113 7ffe48cc1700  5 mon.ceph1@1(electing).elector(35)
> no, we already acked 0
>
> ceph2:
> 2014-03-25 16:17:20.353135 7f80d0013700  5 mon.ceph2@2(electing).elector(35)
> election timer expired
> 2014-03-25 16:17:20.353154 7f80d0013700  5 mon.ceph2@2(electing).elector(35)
> start -- can i be leader?
> 2014-03-25 16:17:20.353225 7f80d0013700  1 mon.ceph2@2(electing).elector(35)
> init, last seen epoch 35
> 2014-03-25 16:17:20.353238 7f80d0013700  1 -- 10.10.30.2:6789/0 --> mon.0
> 10.10.30.0:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose
> 35) v4 -- ?+0 0x18e7900
> 2014-03-25 16:17:20.353272 7f80d0013700  1 -- 10.10.30.2:6789/0 --> mon.1
> 10.10.30.1:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose
> 35) v4 -- ?+0 0x18e7d80
> 2014-03-25 16:17:25.353559 7f80d0013700  5 mon.ceph2@2(electing).elector(35)
> election timer expired
> 2014-03-25 16:17:25.353578 7f80d0013700  5 mon.ceph2@2(electing).elector(35)
> start -- can i be leader?
> 2014-03-25 16:17:25.353647 7f80d0013700  1 mon.ceph2@2(electing).elector(35)
> init, last seen epoch 35
> 2014-03-25 16:17:25.353660 7f80d0013700  1 -- 10.10.30.2:6789/0 --> mon.0
> 10.10.30.0:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose
> 35) v4 -- ?+0 0x19b7240
> 2014-03-25 16:17:25.353695 7f80d0013700  1 -- 10.10.30.2:6789/0 --> mon.1
> 10.10.30.1:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose
> 35) v4 -- ?+0 0x19b76c0
> 2014-03-25 16:17:30.354040 7f80d0013700  5 mon.ceph2@2(electing).elector(35)
> election timer expired
>
> Oddly, it looks to me like mon.2 (ceph2) never handles/receives the proposal
> from mon.0 (ceph0).  But I admit I have no clue how monitor election works.

Likewise, mon.1 sends an ack to mon.0 that is never received. I think
you've got a busted firewall that's allowing some one-way
communication, but not two-way.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux