Thanks Sage. -bash-4.1$ sudo ceph --admin-daemon /var/run/ceph/ceph-mon.osd151.asok mon_status { "name": "osd151", "rank": 2, "state": "electing", "election_epoch": 85469, "quorum": [], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 1, "fsid": "b9cb3ea9-e1de-48b4-9e86-6921e2c537d2", "modified": "0.000000", "created": "0.000000", "mons": [ { "rank": 0, "name": "osd152", "addr": "10.193.207.130:6789\/0"}, { "rank": 1, "name": "osd153", "addr": "10.193.207.131:6789\/0"}, { "rank": 2, "name": "osd151", "addr": "10.194.0.68:6789\/0"}]}} And: -bash-4.1$ sudo ceph --admin-daemon /var/run/ceph/ceph-mon.osd151.asok quorum_status { "election_epoch": 85480, "quorum": [ 0, 1, 2], "quorum_names": [ "osd151", "osd152", "osd153"], "quorum_leader_name": "osd152", "monmap": { "epoch": 1, "fsid": "b9cb3ea9-e1de-48b4-9e86-6921e2c537d2", "modified": "0.000000", "created": "0.000000", "mons": [ { "rank": 0, "name": "osd152", "addr": "10.193.207.130:6789\/0"}, { "rank": 1, "name": "osd153", "addr": "10.193.207.131:6789\/0"}, { "rank": 2, "name": "osd151", "addr": "10.194.0.68:6789\/0"}]}} The election has been finished with leader selected from the above status. Thanks, Guang On Jan 14, 2014, at 10:55 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Tue, 14 Jan 2014, GuangYang wrote: >> Hi ceph-users and ceph-devel, >> I came across an issue after restarting monitors of the cluster, that authentication fails which prevents running any ceph command. >> >> After we did some maintenance work, I restart OSD, however, I found that the OSD would not join the cluster automatically after being restarted, though TCP dump showed it had already sent messenger to monitor telling add me into the cluster. >> >> So that I suspected there might be some issues of monitor and I restarted monitor one by one (3 in total), however, after restarting monitors, all ceph command would fail saying authentication timeout? >> >> 2014-01-14 12:00:30.499397 7fc7f195e700 0 monclient(hunting): authenticate timed out after 300 >> 2014-01-14 12:00:30.499440 7fc7f195e700 0 librados: client.admin authentication error (110) Connection timed out >> Error connecting to cluster: Error >> >> Any idea why such error happened (restarting OSD would result in the same error)? >> >> I am thinking the authentication information is persisted in mon local disk and is there a chance those data got corrupted? > > That sounds unlikely, but you're right that the core problem is with the > mons. What does > > ceph daemon mon.`hostname` mon_status > > say? Perhaps they are not forming a quorum and that is what is preventing > authentication. > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com