Hi , all developers and users when i add a new mon to current mon cluter, failed with 2 mon out of quorum. there are 5 mons in our ceph cluster: epoch 7 fsid 0dfd2bd5-1896-4712-916b-ec02dcc7b049 last_changed 2015-02-13 09:11:45.758839 created 0.000000 0: 10.117.16.17:6789/0 mon.b 1: 10.118.32.7:6789/0 mon.cHEALTH_WARN 2 mons down, quorum 0,1,2 b,c,d mon.e (rank 3) addr 10.122.0.9:6789/0 is down (out of quorum) mon.f (rank 4) addr 10.122.48.11:6789/0 is down (out of quorum) 2: 10.119.16.11:6789/0 mon.d 3: 10.122.0.9:6789/0 mon.e 4: 10.122.48.11:6789/0 mon.f mon.f is newly added to montior cluster, but when starting mon.f, it caused both mon.e and mon.f out of quorum: HEALTH_WARN 2 mons down, quorum 0,1,2 b,c,d mon.e (rank 3) addr 10.122.0.9:6789/0 is down (out of quorum) mon.f (rank 4) addr 10.122.48.11:6789/0 is down (out of quorum) mon.b ,mon.c, mon.d, log refresh crazily as following: Feb 13 09:37:34 root ceph-mon: 2015-02-13 09:37:34.063628 7f7b64e14700 1 mon.b@0(leader).paxos(paxos active c 11818589..11819234) is_readable now=2015-02-13 09:37:34.063629 lease_expire=2015-02-13 09:37:38.205219 has v0 lc 11819234 Feb 13 09:37:34 root ceph-mon: 2015-02-13 09:37:34.090647 7f7b64e14700 1 mon.b@0(leader).paxos(paxos active c 11818589..11819234) is_readable now=2015-02-13 09:37:34.090648 lease_expire=2015-02-13 09:37:38.205219 has v0 lc 11819234 Feb 13 09:37:34 root ceph-mon: 2015-02-13 09:37:34.090661 7f7b64e14700 1 mon.b@0(leader).paxos(paxos active c 11818589..11819234) is_readable now=2015-02-13 09:37:34.090662 lease_expire=2015-02-13 09:37:38.205219 has v0 lc 11819234 ...... and mon.f log : Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.526676 7f3931dfd7c0 0 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f), process ceph-mon, pid 30639 Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.607412 7f3931dfd7c0 0 mon.f does not exist in monmap, will attempt to join an existing cluster Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.609838 7f3931dfd7c0 0 starting mon.f rank -1 at 10.122.48.11:6789/0 mon_data /osd/ceph/mon fsid 0dfd2bd5-1896-4712-916b-ec02dcc7b049 Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.610076 7f3931dfd7c0 1 mon.f@-1(probing) e0 preinit fsid 0dfd2bd5-1896-4712-916b-ec02dcc7b049 Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.636499 7f392a504700 0 -- 10.122.48.11:6789/0 >> 10.119.16.11:6789/0 pipe(0x7f3934ebfb80 sd=26 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934ea9ce0).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.636797 7f392a201700 0 -- 10.122.48.11:6789/0 >> 10.122.0.9:6789/0 pipe(0x7f3934ec0800 sd=29 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa940).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.636968 7f392a403700 0 -- 10.122.48.11:6789/0 >> 10.118.32.7:6789/0 pipe(0x7f3934ec0080 sd=27 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934ea9e40).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.637037 7f392a302700 0 -- 10.122.48.11:6789/0 >> 10.117.16.17:6789/0 pipe(0x7f3934ebfe00 sd=28 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa260).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.638854 7f392c00a700 0 mon.f@-1(probing) e7 my rank is now 4 (was -1) Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639365 7f392c00a700 1 mon.f@4(synchronizing) e7 sync_obtain_latest_monmap Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639494 7f392b008700 0 -- 10.122.48.11:6789/0 >> 10.122.0.9:6789/0 pipe(0x7f3934ec0580 sd=17 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa680).accept connect_seq 2 vs existing 0 state connecting Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639513 7f392b008700 0 -- 10.122.48.11:6789/0 >> 10.122.0.9:6789/0 pipe(0x7f3934ec0580 sd=17 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa680).accept we reset (peer sent cseq 2, 0x7f3934ebf400.cseq = 0), sending RESETSESSION ...... Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.643159 7f392af07700 0 -- 10.122.48.11:6789/0 >> 10.119.16.11:6789/0 pipe(0x7f3934ec1700 sd=28 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eab2e0).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.637037 7f392a302700 0 -- 10.122.48.11:6789/0 >> 10.117.16.17:6789/0 pipe(0x7f3934ebfe00 sd=28 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa260).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.638854 7f392c00a700 0 mon.f@-1(probing) e7 my rank is now 4 (was -1) Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639365 7f392c00a700 1 mon.f@4(synchronizing) e7 sync_obtain_latest_monmap Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639494 7f392b008700 0 -- 10.122.48.11:6789/0 >> 10.122.0.9:6789/0 pipe(0x7f3934ec0580 sd=17 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa680).accept connect_seq 2 vs existing 0 state connecting Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639513 7f392b008700 0 -- 10.122.48.11:6789/0 >> 10.122.0.9:6789/0 pipe(0x7f3934ec0580 sd=17 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa680).accept we reset (peer sent cseq 2, 0x7f3934ebf400.cseq = 0), sending RESETSESSION ...... Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.643273 7f392c00a700 1 mon.f@4(synchronizing) e7 sync_obtain_latest_monmap obtained monmap e7 Feb 13 09:17:26 root ceph-mon: 2015-02-13 09:17:26.611550 7f392c80b700 0 mon.f@4(synchronizing).data_health(0) update_stats avail 99% total 911815680 used 33132 avail 911782548 Feb 13 09:17:26 root ceph-mon: 2015-02-13 09:17:26.708961 7f392c00a700 1 mon.f@4(synchronizing) e7 sync_obtain_latest_monmap Feb 13 09:17:26 root ceph-mon: 2015-02-13 09:17:26.709063 7f392c00a700 1 mon.f@4(synchronizing) e7 sync_obtain_latest_monmap obtained monmap e7 someone can help? thank you! minchen
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com