Ceph mon quorum

Alexis GÜNST HORN <alexis.gunsthorn@xxxxxxxxxxxx> · Fri, 5 Apr 2013 13:57:25 +0200

Hello to all,

I've a Ceph cluster composed of 4 nodes in 2 differents rooms.

room A : osd.1, osd.3, mon.a, mon.c
room B : osd.2, osd.4, mon.b

My crush rule is made to make replica accross rooms.
So normally, if I shut the whole room A, my cluster should stay usable.

... but, in fact no.
When i switch off room A, mon.b does not succeed in managing the cluster.
Here is the log of mon.b :

2013-04-05 11:46:11.842267 7f42e61fc700  0 mon.b@1(peon) e1
handle_command mon_command(status v 0) v1
2013-04-05 11:46:12.746317 7f42e61fc700  0 mon.b@1(peon) e1
handle_command mon_command(status v 0) v1
2013-04-05 11:46:17.684378 7f42e46f3700  0 -- 10.0.3.2:6789/0 >>
10.0.3.1:6789/0 pipe(0x7f42d4002c80 sd=26 :6789 s=2 pgs=47 cs=1
l=0).fault, initiating reconnect
2013-04-05 11:46:17.685624 7f42f0e93700  0 -- 10.0.3.2:6789/0 >>
10.0.3.1:6789/0 pipe(0x7f42d4002c80 sd=19 :35755 s=1 pgs=47 cs=2
l=0).fault
2013-04-05 11:46:17.721214 7f4266eee700  0 -- 10.0.3.2:6789/0 >>
10.0.3.3:6789/0 pipe(0x2b4c480 sd=17 :58791 s=2 pgs=26 cs=1 l=0).fault
with nothing to send, going to standby
2013-04-05 11:46:18.453162 7f42e61fc700  0 mon.b@1(peon) e1
handle_command mon_command(status v 0) v1
2013-04-05 11:46:25.638744 7f42ec80d700  0 -- 10.0.3.2:6789/0 >>
10.0.3.3:6789/0 pipe(0x2b4c480 sd=17 :58791 s=1 pgs=26 cs=2 l=0).fault

What I understand is that, yes, mon.b knows that mon.a and mon.c are
down, but it can't join the quorum. Why ?

Thanks for your answers.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com