Re: Trying to rescue a lost quorum

Marc <mail@xxxxxxxxxx> · Tue, 04 Mar 2014 12:32:12 +0100

UPDATE. I have determined mon sync heartbeat timeout to be triggering
since increasing it also increases the duration of the sync attempts.
Could those heartbeats be quorum-related? Thatd explain why they aren't
being sent. Also is it safe to temporarily increase this timeout to say
an hour or two to give the mons enough time to sync up?

Advice is greatly appreciated.

> Hi,
> 
> I had already figured that out later, thanks though. So back to .61.2 it
> was. I was then trying to see whether debug logging would tell me why
> the mons wont rejoin the cluster. Their logs look like this:
> 
> (Interesting part at the bottom... I think)
> 
> 
> 2014-03-02 14:25:34.960372 7f7c13a6e700 10 mon.g@3(???) e16 bootstrap
> 2014-03-02 14:25:34.961178 7f7c13a6e700 10 mon.g@3(???) e16
> unregister_cluster_logger - not registered
> 2014-03-02 14:25:34.961721 7f7c13a6e700 10 mon.g@3(???) e16
> cancel_probe_timeout (none scheduled)
> 2014-03-02 14:25:34.962271 7f7c13a6e700 10 mon.g@3(???) e16 reset_sync
> 2014-03-02 14:25:34.962477 7f7c13a6e700 10 mon.g@3(probing) e16 reset
> 2014-03-02 14:25:34.962647 7f7c13a6e700 10 mon.g@3(probing) e16
> timecheck_finish
> 2014-03-02 14:25:34.962827 7f7c13a6e700 10 mon.g@3(probing) e16
> cancel_probe_timeout (none scheduled)
> 2014-03-02 14:25:34.963002 7f7c13a6e700 10 mon.g@3(probing) e16
> reset_probe_timeout 0x12dca90 after 2 seconds
> 2014-03-02 14:25:34.963217 7f7c13a6e700 10 mon.g@3(probing) e16 probing
> other monitors
> 2014-03-02 14:25:34.965352 7f7c1326d700 10 mon.g@3(probing) e16
> handle_sync mon_sync( chunk bl 923776 bytes last_key ( paxos,14626815 ) ) v1
> 2014-03-02 14:25:34.965362 7f7c1326d700 10 mon.g@3(probing) e16
> handle_sync_chunk mon_sync( chunk bl 923776 bytes last_key (
> paxos,14626815 ) ) v1
> 2014-03-02 14:25:34.965367 7f7c1326d700  1 mon.g@3(probing) e16
> handle_sync_chunk stray message -- drop it.
> 2014-03-02 14:25:34.965408 7f7c1326d700 10 mon.g@3(probing) e16
> handle_probe mon_probe(reply b54f9ae1-5638-436d-8652-61aa6ede994d name b
> paxos( fc 14616085 lc 15329444 )) v4
> 2014-03-02 14:25:34.965416 7f7c1326d700 10 mon.g@3(probing) e16
> handle_probe_reply mon.1 XXX.YYY.ZZZ.202:6789/0mon_probe(reply
> b54f9ae1-5638-436d-8652-61aa6ede994d name b paxos( fc 14616085 lc
> 15329444 )) v4
> 2014-03-02 14:25:34.965427 7f7c1326d700 10 mon.g@3(probing) e16  monmap
> is e16: 4 mons at
> {a=XXX.YYY.ZZZ.201:6789/0,b=XXX.YYY.ZZZ.202:6789/0,c=XXX.YYY.ZZZ.203:6789/0,g=XXX.YYY.ZZZ.207:6789/0}
> 2014-03-02 14:25:34.965441 7f7c1326d700 10 mon.g@3(probing) e16  peer
> name is b
> 2014-03-02 14:25:34.965445 7f7c1326d700 10 mon.g@3(probing) e16  mon.b
> is outside the quorum
> 2014-03-02 14:25:34.965448 7f7c1326d700 10 mon.g@3(probing) e16  peer
> paxos version 15329444 vs my version 0 (too far ahead)
> 2014-03-02 14:25:34.965451 7f7c1326d700 10 mon.g@3(probing) e16
> cancel_probe_timeout 0x12dca90
> 2014-03-02 14:25:34.965454 7f7c1326d700 10 mon.g@3(probing) e16
> sync_start entity( mon.1 XXX.YYY.ZZZ.202:6789/0 )
> 2014-03-02 14:32:59.971052 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state start )) e16 sync_store_init backup current monmap
> 2014-03-02 14:34:28.346794 7f7c13a6e700 10 mon.g@3(synchronizing sync(
> requester state start )).data_health(0) service_tick
> 2014-03-02 14:34:28.347382 7f7c13a6e700  0 mon.g@3(synchronizing sync(
> requester state start )).data_health(0) update_stats avail 79% total
> 52019796 used 8204628 avail 41165984
> 2014-03-02 14:34:28.348176 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state start )) e16 handle_sync mon_sync( start_reply ) v1
> 2014-03-02 14:34:28.348183 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state start )) e16 handle_sync_start_reply mon_sync(
> start_reply ) v1
> 2014-03-02 14:34:28.348188 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state start )) e16 handle_sync_start_reply synchronizing from
> leader at mon.1 XXX.YYY.ZZZ.202:6789/0
> 2014-03-02 14:34:28.348197 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state start )) e16 sync_send_heartbeat mon.1
> XXX.YYY.ZZZ.202:6789/0 reply(0)
> 2014-03-02 14:34:28.348220 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state start )) e16 sync_start_chunks provider(mon.1
> XXX.YYY.ZZZ.202:6789/0)
> 2014-03-02 14:34:28.348461 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state chunks )) e16 handle_probe mon_probe(reply
> b54f9ae1-5638-436d-8652-61aa6ede994d name a paxos( fc 14616085 lc
> 15329444 )) v4
> 2014-03-02 14:34:28.348469 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state chunks )) e16 handle_probe_reply mon.0
> XXX.YYY.ZZZ.201:6789/0mon_probe(reply
> b54f9ae1-5638-436d-8652-61aa6ede994d name a paxos( fc 14616085 lc
> 15329444 )) v4
> 2014-03-02 14:34:28.348477 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state chunks )) e16  monmap is e16: 4 mons at
> {a=XXX.YYY.ZZZ.201:6789/0,b=XXX.YYY.ZZZ.202:6789/0,c=XXX.YYY.ZZZ.203:6789/0,g=XXX.YYY.ZZZ.207:6789/0}
> 2014-03-02 14:34:28.591392 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state chunks )) e16 handle_sync mon_sync( chunk bl 1047926
> bytes last_key ( logm,full_6382802 ) ) v1
> 
> [...] I cut 30 seconds of handle_sync and handle_sync_chunk here
> 
> 2014-03-02 14:34:58.571252 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state chunks )) e16 handle_sync mon_sync( chunk bl 981393
> bytes last_key ( paxos,14626365 ) ) v1
> 2014-03-02 14:34:58.571262 7f7c1326d700 10 mon.g@3(synchronizing sync(
> requester state chunks )) e16 handle_sync_chunk mon_sync( chunk bl
> 981393 bytes last_key ( paxos,14626365 ) ) v1
> 2014-03-02 14:34:58.803063 7f7c13a6e700 10 mon.g@3(synchronizing sync(
> requester state chunks )) e16 sync_requester_abort mon.1
> XXX.YYY.ZZZ.202:6789/0 mon.1 XXX.YYY.ZZZ.202:6789/0 clearing potentially
> inconsistent store
> 2014-03-02 14:36:38.999310 7f7c13a6e700  1 mon.g@3(synchronizing sync(
> requester state chunks )) e16 sync_requester_abort no longer a sync
> requester
> 
> 
> Whats interesting is that theres no error message whatsoever, but the
> timestamps indicate exactly 30 seconds passing. So I guess this sync is
> somehow triggering the 30s mon sync timeout. Should it do that? Should I
> try increasing the sync timeout? Advice on how to proceed is very welcome.
> 
> Thanks,
> Marc
> 
> On 01/03/2014 17:51, Martin B Nielsen wrote:
>> Hi,
>>
>> You can't form quorom with your monitors on cuttlefish if you're mixing <
>> 0.61.5 with any 0.61.5+ ( https://ceph.com/docs/master/release-notes/ ) =>
>> section about 0.61.5.
>>
>> I'll advice installing pre-0.61.5, form quorom and then upgrade to 0.61.9
>> (if needs be) - and then latest dumpling on top.
>>
>> Cheers,
>> Martin
>>
>>
>> On Fri, Feb 28, 2014 at 2:09 AM, Marc <mail@xxxxxxxxxx> wrote:
>>
>>> Hi,
>>>
>>> thanks for the reply. I updated one of the new mons. And after a
>>> resonably long init phase (inconsistent state), I am now seeing these:
>>>
>>> 2014-02-28 01:05:12.344648 7fe9d05cb700  0 cephx: verify_reply coudln't
>>> decrypt with error: error decoding block for decryption
>>> 2014-02-28 01:05:12.345599 7fe9d05cb700  0 -- X.Y.Z.207:6789/0 >>
>>> X.Y.Z.201:6789/0 pipe(0x14e1400 sd=21 :49082 s=1 pgs=5421935 cs=12
>>> l=0).failed verifying authorize reply
>>>
>>> with .207 being the updated mon and .201 being one of the "old" alive
>>> mons. I guess they don't understand each other? I would rather not try
>>> to update the mons running on servers that also host OSDs, especially
>>> since there seem to be communication issues between those versions... or
>>> am I reading this wrong?
>>>
>>> KR,
>>> Marc
>>>
>>> On 28/02/2014 01:32, Gregory Farnum wrote:
>>>> On Thu, Feb 27, 2014 at 4:25 PM, Marc <mail@xxxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> I was handed a Ceph cluster that had just lost quorum due to 2/3 mons
>>>>> (b,c) running out of disk space (using up 15GB each). We were trying to
>>>>> rescue this cluster without service downtime. As such we freed up some
>>>>> space to keep mon b running a while longer, which succeeded, quorum
>>>>> restored (a,b), mon c remained offline. Even though we have freed up
>>>>> some space on mon c's disk also, that mon just won't start. It's log
>>>>> file does say
>>>>>
>>>>> ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60), process
>>>>> ceph-mon, pid 27846
>>>>>
>>>>> and thats all she wrote. Even when starting ceph-mon with -d mind you.
>>>>>
>>>>> So we had a cluster with 2/3 mons up and wanted to add another mon since
>>>>> it was only a matter of time til mon b failed again due to disk space.
>>>>>
>>>>> As such I added mon.g to the cluster, which took a long while to sync,
>>>>> but now reports running.
>>>>>
>>>>> Then mon.h got added for the same reason. mon.h fails to start much the
>>>>> same as mon.c does.
>>>>>
>>>>> Still that should leave us with 3/5 mons up. However running "ceph
>>>>> daemon mon.{g,h} mon_status" on the respective node also blocks. The
>>>>> only output we get from those are fault messages.
>>>>>
>>>>> Ok so now mon.g apparantly crashed:
>>>>>
>>>>> 2014-02-28 00:11:48.861263 7f4728042700 -1 mon/Monitor.cc: In function
>>>>> 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f4728042700 time
>>>>> 2014-02-28 00:11:48.782305 mon/Monitor.cc: 1099: FAILED
>>>>> assert(sync_state == SYNC_STATE_CHUNKS)
>>>>>
>>>>> ... and now blocks trying to start much like c and h.
>>>>>
>>>>> Long story short: is it possible to add .61.9 mons to a cluster running
>>>>> .61.2 on the 2 alive mons and all the osds? I'm guessing this is the
>>>>> last shot at trying to rescue the cluster without downtime.
>>>> That should be fine, and is likely (though not guaranteed) to resolve
>>>> your sync issues -- although it's pretty unfortunate that you're that
>>>> far behind on the point releases; they fixed a whole lot of sync
>>>> issues and related things and you might need to upgrade the existing
>>>> monitors too in order to get the fixes you need... :/
>>>> -Greg
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com