Re: ceph mon unable to reach quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





lee_yiu_chung@xxxxxxxxx 於 18/1/2017 11:17 寫道:
Dear all,

I have a ceph installation (dev site) with two nodes, each running a mon daemon and osd daemon.
(Yes, I know running a cluster of two mon is bad, but I have no choice since I only have two nodes.)

Now, the two nodes are migrated to another datacenter, but after it is booted up the mon daemon are
unable to reach quorum. How can I proceed? (If there is no way to recover, I can accept the loss but
I wish to know how to avoid this to happen again.)

Here is the mon_status output of the two nodes:

-----------------------

root@openstack003:/var/log/ceph# ceph daemon mon.openstack003 mon_status
{
    "name": "openstack003",
    "rank": 0,
    "state": "electing",
    "election_epoch": 45,
    "quorum": [],
    "outside_quorum": [],
    "extra_probe_peers": [],
    "sync_provider": [
        754974721,
        "mon.1 10.41.41.4:6789\/0",
        "2017-01-18 02:53:49.425917",
        5654786,
        ","
    ],
    "monmap": {
        "epoch": 10,
        "fsid": "71861477-db77-4fab-a8f8-10d3b16e1722",
        "modified": "2016-10-19 06:41:29.202924",
        "created": "2016-10-19 06:26:24.911408",
        "mons": [
            {
                "rank": 0,
                "name": "openstack003",
                "addr": "10.41.41.3:6789\/0"
            },
            {
                "rank": 1,
                "name": "openstack004",
                "addr": "10.41.41.4:6789\/0"
            }
        ]
    }
}


root@openstack004:/var/log/ceph# ceph daemon mon.openstack004 mon_status
{
    "name": "openstack004",
    "rank": 1,
    "state": "probing",
    "election_epoch": 0,
    "quorum": [],
    "outside_quorum": [
        "openstack004"
    ],
    "extra_probe_peers": [],
    "sync_provider": [],
    "monmap": {
        "epoch": 10,
        "fsid": "71861477-db77-4fab-a8f8-10d3b16e1722",
        "modified": "2016-10-19 06:41:29.202924",
        "created": "2016-10-19 06:26:24.911408",
        "mons": [
            {
                "rank": 0,
                "name": "openstack003",
                "addr": "10.41.41.3:6789\/0"
            },
            {
                "rank": 1,
                "name": "openstack004",
                "addr": "10.41.41.4:6789\/0"
            }
        ]
    }
}


----------------------------

Here are the logs of the two mons:

2017-01-18 03:15:04.675296 7fc892173700  5 mon.openstack003@0(electing).elector(45) start -- can i
be leader?
2017-01-18 03:15:04.675355 7fc892173700  1 mon.openstack003@0(electing).elector(45) init, last seen
epoch 45
2017-01-18 03:15:04.675932 7fc892173700  1 -- 10.41.41.3:6789/0 --> mon.1 10.41.41.4:6789/0 --
election(71861477-db77-4fab-a8f8-10d3b16e1722 propose 45) v5 -- ?+0 0x55b488984700
2017-01-18 03:15:05.515390 7fc891972700  1 -- 10.41.41.3:6789/0 <== mon.1 10.41.41.4:6789/0 675 ====
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 ==== 69+0+0 (72044430 0
0) 0x55b4889bf600 con 0x55b487666900
2017-01-18 03:15:05.515458 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6
2017-01-18 03:15:05.515463 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe_probe mon.1
10.41.41.4:6789/0mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 features
576460752032874495
2017-01-18 03:15:05.515500 7fc891972700  1 -- 10.41.41.3:6789/0 --> 10.41.41.4:6789/0 --
mon_probe(reply 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack003 paxos( fc 5654529 lc 5655078
)) v6 -- ?+0 0x55b4889bf340 con 0x55b487666900
2017-01-18 03:15:07.515552 7fc891972700  1 -- 10.41.41.3:6789/0 <== mon.1 10.41.41.4:6789/0 676 ====
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 ==== 69+0+0 (72044430 0
0) 0x55b4889bf8c0 con 0x55b487666900
2017-01-18 03:15:07.515620 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6
2017-01-18 03:15:07.515625 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe_probe mon.1
10.41.41.4:6789/0mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 features
576460752032874495
2017-01-18 03:15:07.515652 7fc891972700  1 -- 10.41.41.3:6789/0 --> 10.41.41.4:6789/0 --
mon_probe(reply 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack003 paxos( fc 5654529 lc 5655078
)) v6 -- ?+0 0x55b4889bf600 con 0x55b487666900
2017-01-18 03:15:09.515709 7fc891972700  1 -- 10.41.41.3:6789/0 <== mon.1 10.41.41.4:6789/0 677 ====
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 ==== 69+0+0 (72044430 0
0) 0x55b4889bfb80 con 0x55b487666900
2017-01-18 03:15:09.515777 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6
2017-01-18 03:15:09.515782 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe_probe mon.1
10.41.41.4:6789/0mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 features
576460752032874495
2017-01-18 03:15:09.515797 7fc891972700  1 -- 10.41.41.3:6789/0 --> 10.41.41.4:6789/0 --
mon_probe(reply 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack003 paxos( fc 5654529 lc 5655078
)) v6 -- ?+0 0x55b4889bf8c0 con 0x55b487666900
2017-01-18 03:15:09.676118 7fc892173700  5 mon.openstack003@0(electing).elector(45) election timer
expired

------------------------------

2017-01-18 03:15:04.675296 7fc892173700  5 mon.openstack003@0(electing).elector(45) start -- can i
be leader?
2017-01-18 03:15:04.675355 7fc892173700  1 mon.openstack003@0(electing).elector(45) init, last seen
epoch 45
2017-01-18 03:15:04.675932 7fc892173700  1 -- 10.41.41.3:6789/0 --> mon.1 10.41.41.4:6789/0 --
election(71861477-db77-4fab-a8f8-10d3b16e1722 propose 45) v5 -- ?+0 0x55b488984700
2017-01-18 03:15:05.515390 7fc891972700  1 -- 10.41.41.3:6789/0 <== mon.1 10.41.41.4:6789/0 675 ====
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 ==== 69+0+0 (72044430 0
0) 0x55b4889bf600 con 0x55b487666900
2017-01-18 03:15:05.515458 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6
2017-01-18 03:15:05.515463 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe_probe mon.1
10.41.41.4:6789/0mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 features
576460752032874495
2017-01-18 03:15:05.515500 7fc891972700  1 -- 10.41.41.3:6789/0 --> 10.41.41.4:6789/0 --
mon_probe(reply 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack003 paxos( fc 5654529 lc 5655078
)) v6 -- ?+0 0x55b4889bf340 con 0x55b487666900
2017-01-18 03:15:07.515552 7fc891972700  1 -- 10.41.41.3:6789/0 <== mon.1 10.41.41.4:6789/0 676 ====
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 ==== 69+0+0 (72044430 0
0) 0x55b4889bf8c0 con 0x55b487666900
2017-01-18 03:15:07.515620 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6
2017-01-18 03:15:07.515625 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe_probe mon.1
10.41.41.4:6789/0mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 features
576460752032874495
2017-01-18 03:15:07.515652 7fc891972700  1 -- 10.41.41.3:6789/0 --> 10.41.41.4:6789/0 --
mon_probe(reply 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack003 paxos( fc 5654529 lc 5655078
)) v6 -- ?+0 0x55b4889bf600 con 0x55b487666900
2017-01-18 03:15:09.515709 7fc891972700  1 -- 10.41.41.3:6789/0 <== mon.1 10.41.41.4:6789/0 677 ====
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 ==== 69+0+0 (72044430 0
0) 0x55b4889bfb80 con 0x55b487666900
2017-01-18 03:15:09.515777 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6
2017-01-18 03:15:09.515782 7fc891972700 10 mon.openstack003@0(electing) e10 handle_probe_probe mon.1
10.41.41.4:6789/0mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6 features
576460752032874495
2017-01-18 03:15:09.515797 7fc891972700  1 -- 10.41.41.3:6789/0 --> 10.41.41.4:6789/0 --
mon_probe(reply 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack003 paxos( fc 5654529 lc 5655078
)) v6 -- ?+0 0x55b4889bf8c0 con 0x55b487666900
2017-01-18 03:15:09.676118 7fc892173700  5 mon.openstack003@0(electing).elector(45) election timer
expired



I tried hard in these few days, but still get stuck. Do anyone have clue on how to rescuing the ceph cluster?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux