Re: Trying to rescue a lost quorum

Marc <mail@xxxxxxxxxx> · Sun, 02 Mar 2014 15:58:44 +0100

Hi,

I had already figured that out later, thanks though. So back to .61.2 it
was. I was then trying to see whether debug logging would tell me why
the mons wont rejoin the cluster. Their logs look like this:

(Interesting part at the bottom... I think)

2014-03-02 14:25:34.960372 7f7c13a6e700 10 mon.g@3(???) e16 bootstrap
2014-03-02 14:25:34.961178 7f7c13a6e700 10 mon.g@3(???) e16
unregister_cluster_logger - not registered
2014-03-02 14:25:34.961721 7f7c13a6e700 10 mon.g@3(???) e16
cancel_probe_timeout (none scheduled)
2014-03-02 14:25:34.962271 7f7c13a6e700 10 mon.g@3(???) e16 reset_sync
2014-03-02 14:25:34.962477 7f7c13a6e700 10 mon.g@3(probing) e16 reset
2014-03-02 14:25:34.962647 7f7c13a6e700 10 mon.g@3(probing) e16
timecheck_finish
2014-03-02 14:25:34.962827 7f7c13a6e700 10 mon.g@3(probing) e16
cancel_probe_timeout (none scheduled)
2014-03-02 14:25:34.963002 7f7c13a6e700 10 mon.g@3(probing) e16
reset_probe_timeout 0x12dca90 after 2 seconds
2014-03-02 14:25:34.963217 7f7c13a6e700 10 mon.g@3(probing) e16 probing
other monitors
2014-03-02 14:25:34.965352 7f7c1326d700 10 mon.g@3(probing) e16
handle_sync mon_sync( chunk bl 923776 bytes last_key ( paxos,14626815 ) ) v1
2014-03-02 14:25:34.965362 7f7c1326d700 10 mon.g@3(probing) e16
handle_sync_chunk mon_sync( chunk bl 923776 bytes last_key (
paxos,14626815 ) ) v1
2014-03-02 14:25:34.965367 7f7c1326d700  1 mon.g@3(probing) e16
handle_sync_chunk stray message -- drop it.
2014-03-02 14:25:34.965408 7f7c1326d700 10 mon.g@3(probing) e16
handle_probe mon_probe(reply b54f9ae1-5638-436d-8652-61aa6ede994d name b
paxos( fc 14616085 lc 15329444 )) v4
2014-03-02 14:25:34.965416 7f7c1326d700 10 mon.g@3(probing) e16
handle_probe_reply mon.1 XXX.YYY.ZZZ.202:6789/0mon_probe(reply
b54f9ae1-5638-436d-8652-61aa6ede994d name b paxos( fc 14616085 lc
15329444 )) v4
2014-03-02 14:25:34.965427 7f7c1326d700 10 mon.g@3(probing) e16  monmap
is e16: 4 mons at
{a=XXX.YYY.ZZZ.201:6789/0,b=XXX.YYY.ZZZ.202:6789/0,c=XXX.YYY.ZZZ.203:6789/0,g=XXX.YYY.ZZZ.207:6789/0}
2014-03-02 14:25:34.965441 7f7c1326d700 10 mon.g@3(probing) e16  peer
name is b
2014-03-02 14:25:34.965445 7f7c1326d700 10 mon.g@3(probing) e16  mon.b
is outside the quorum
2014-03-02 14:25:34.965448 7f7c1326d700 10 mon.g@3(probing) e16  peer
paxos version 15329444 vs my version 0 (too far ahead)
2014-03-02 14:25:34.965451 7f7c1326d700 10 mon.g@3(probing) e16
cancel_probe_timeout 0x12dca90
2014-03-02 14:25:34.965454 7f7c1326d700 10 mon.g@3(probing) e16
sync_start entity( mon.1 XXX.YYY.ZZZ.202:6789/0 )
2014-03-02 14:32:59.971052 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state start )) e16 sync_store_init backup current monmap
2014-03-02 14:34:28.346794 7f7c13a6e700 10 mon.g@3(synchronizing sync(
requester state start )).data_health(0) service_tick
2014-03-02 14:34:28.347382 7f7c13a6e700  0 mon.g@3(synchronizing sync(
requester state start )).data_health(0) update_stats avail 79% total
52019796 used 8204628 avail 41165984
2014-03-02 14:34:28.348176 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state start )) e16 handle_sync mon_sync( start_reply ) v1
2014-03-02 14:34:28.348183 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state start )) e16 handle_sync_start_reply mon_sync(
start_reply ) v1
2014-03-02 14:34:28.348188 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state start )) e16 handle_sync_start_reply synchronizing from
leader at mon.1 XXX.YYY.ZZZ.202:6789/0
2014-03-02 14:34:28.348197 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state start )) e16 sync_send_heartbeat mon.1
XXX.YYY.ZZZ.202:6789/0 reply(0)
2014-03-02 14:34:28.348220 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state start )) e16 sync_start_chunks provider(mon.1
XXX.YYY.ZZZ.202:6789/0)
2014-03-02 14:34:28.348461 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state chunks )) e16 handle_probe mon_probe(reply
b54f9ae1-5638-436d-8652-61aa6ede994d name a paxos( fc 14616085 lc
15329444 )) v4
2014-03-02 14:34:28.348469 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state chunks )) e16 handle_probe_reply mon.0
XXX.YYY.ZZZ.201:6789/0mon_probe(reply
b54f9ae1-5638-436d-8652-61aa6ede994d name a paxos( fc 14616085 lc
15329444 )) v4
2014-03-02 14:34:28.348477 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state chunks )) e16  monmap is e16: 4 mons at
{a=XXX.YYY.ZZZ.201:6789/0,b=XXX.YYY.ZZZ.202:6789/0,c=XXX.YYY.ZZZ.203:6789/0,g=XXX.YYY.ZZZ.207:6789/0}
2014-03-02 14:34:28.591392 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state chunks )) e16 handle_sync mon_sync( chunk bl 1047926
bytes last_key ( logm,full_6382802 ) ) v1

[...] I cut 30 seconds of handle_sync and handle_sync_chunk here

2014-03-02 14:34:58.571252 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state chunks )) e16 handle_sync mon_sync( chunk bl 981393
bytes last_key ( paxos,14626365 ) ) v1
2014-03-02 14:34:58.571262 7f7c1326d700 10 mon.g@3(synchronizing sync(
requester state chunks )) e16 handle_sync_chunk mon_sync( chunk bl
981393 bytes last_key ( paxos,14626365 ) ) v1
2014-03-02 14:34:58.803063 7f7c13a6e700 10 mon.g@3(synchronizing sync(
requester state chunks )) e16 sync_requester_abort mon.1
XXX.YYY.ZZZ.202:6789/0 mon.1 XXX.YYY.ZZZ.202:6789/0 clearing potentially
inconsistent store
2014-03-02 14:36:38.999310 7f7c13a6e700  1 mon.g@3(synchronizing sync(
requester state chunks )) e16 sync_requester_abort no longer a sync
requester

Whats interesting is that theres no error message whatsoever, but the
timestamps indicate exactly 30 seconds passing. So I guess this sync is
somehow triggering the 30s mon sync timeout. Should it do that? Should I
try increasing the sync timeout? Advice on how to proceed is very welcome.

Thanks,
Marc

On 01/03/2014 17:51, Martin B Nielsen wrote:
> Hi,
> 
> You can't form quorom with your monitors on cuttlefish if you're mixing <
> 0.61.5 with any 0.61.5+ ( https://ceph.com/docs/master/release-notes/ ) =>
> section about 0.61.5.
> 
> I'll advice installing pre-0.61.5, form quorom and then upgrade to 0.61.9
> (if needs be) - and then latest dumpling on top.
> 
> Cheers,
> Martin
> 
> 
> On Fri, Feb 28, 2014 at 2:09 AM, Marc <mail@xxxxxxxxxx> wrote:
> 
>> Hi,
>>
>> thanks for the reply. I updated one of the new mons. And after a
>> resonably long init phase (inconsistent state), I am now seeing these:
>>
>> 2014-02-28 01:05:12.344648 7fe9d05cb700  0 cephx: verify_reply coudln't
>> decrypt with error: error decoding block for decryption
>> 2014-02-28 01:05:12.345599 7fe9d05cb700  0 -- X.Y.Z.207:6789/0 >>
>> X.Y.Z.201:6789/0 pipe(0x14e1400 sd=21 :49082 s=1 pgs=5421935 cs=12
>> l=0).failed verifying authorize reply
>>
>> with .207 being the updated mon and .201 being one of the "old" alive
>> mons. I guess they don't understand each other? I would rather not try
>> to update the mons running on servers that also host OSDs, especially
>> since there seem to be communication issues between those versions... or
>> am I reading this wrong?
>>
>> KR,
>> Marc
>>
>> On 28/02/2014 01:32, Gregory Farnum wrote:
>>> On Thu, Feb 27, 2014 at 4:25 PM, Marc <mail@xxxxxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> I was handed a Ceph cluster that had just lost quorum due to 2/3 mons
>>>> (b,c) running out of disk space (using up 15GB each). We were trying to
>>>> rescue this cluster without service downtime. As such we freed up some
>>>> space to keep mon b running a while longer, which succeeded, quorum
>>>> restored (a,b), mon c remained offline. Even though we have freed up
>>>> some space on mon c's disk also, that mon just won't start. It's log
>>>> file does say
>>>>
>>>> ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60), process
>>>> ceph-mon, pid 27846
>>>>
>>>> and thats all she wrote. Even when starting ceph-mon with -d mind you.
>>>>
>>>> So we had a cluster with 2/3 mons up and wanted to add another mon since
>>>> it was only a matter of time til mon b failed again due to disk space.
>>>>
>>>> As such I added mon.g to the cluster, which took a long while to sync,
>>>> but now reports running.
>>>>
>>>> Then mon.h got added for the same reason. mon.h fails to start much the
>>>> same as mon.c does.
>>>>
>>>> Still that should leave us with 3/5 mons up. However running "ceph
>>>> daemon mon.{g,h} mon_status" on the respective node also blocks. The
>>>> only output we get from those are fault messages.
>>>>
>>>> Ok so now mon.g apparantly crashed:
>>>>
>>>> 2014-02-28 00:11:48.861263 7f4728042700 -1 mon/Monitor.cc: In function
>>>> 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f4728042700 time
>>>> 2014-02-28 00:11:48.782305 mon/Monitor.cc: 1099: FAILED
>>>> assert(sync_state == SYNC_STATE_CHUNKS)
>>>>
>>>> ... and now blocks trying to start much like c and h.
>>>>
>>>> Long story short: is it possible to add .61.9 mons to a cluster running
>>>> .61.2 on the 2 alive mons and all the osds? I'm guessing this is the
>>>> last shot at trying to rescue the cluster without downtime.
>>> That should be fine, and is likely (though not guaranteed) to resolve
>>> your sync issues -- although it's pretty unfortunate that you're that
>>> far behind on the point releases; they fixed a whole lot of sync
>>> issues and related things and you might need to upgrade the existing
>>> monitors too in order to get the fixes you need... :/
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com