Re: ceph mons stuck in electing state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What change did you make in your ceph.conf?

Id say it be a good idea to check and make sure that hasn't caused the issue.

,Ashley


---- On Tue, 27 Aug 2019 04:37:15 +0800 nkerns92@xxxxxxxxx wrote ----

Hello,

I have an old ceph 0.94.10 cluster that had 10 storage nodes with one extra management node used for running commands on the cluster. Over time we'd had some hardware failures on some of the storage nodes, so we're down to 6, with ceph-mon running on the management server and 4 of the storage nodes. We attempted deploying a ceph.conf change and restarted ceph-mon and ceph-osd services, but the cluster went down on us. We found all the ceph-mons are stuck in the electing state, I can't get any response from any ceph commands but I found I can contact the daemon directly and get this information (hostnames removed for privacy reasons):

root@<mgmt1>:~# ceph daemon mon.<mgmt1> mon_status
{
    "name": "<mgmt1>",
    "rank": 0,
    "state": "electing",
    "election_epoch": 4327,
    "quorum": [],
    "outside_quorum": [],
    "extra_probe_peers": [],
    "sync_provider": [],
    "monmap": {
        "epoch": 10,
        "fsid": "69611c75-200f-4861-8709-8a0adc64a1c9",
        "modified": "2019-08-23 08:20:57.620147",
        "created": "0.000000",
        "mons": [
            {
                "rank": 0,
                "name": "<mgmt1>",
                "addr": "[fdc4:8570:e14c:132d::15]:6789\/0"
            },
            {
                "rank": 1,
                "name": "<mon1>",
                "addr": "[fdc4:8570:e14c:132d::16]:6789\/0"
            },
            {
                "rank": 2,
                "name": "<mon2>",
                "addr": "[fdc4:8570:e14c:132d::28]:6789\/0"
            },
            {
                "rank": 3,
                "name": "<mon3>",
                "addr": "[fdc4:8570:e14c:132d::29]:6789\/0"
            },
            {
                "rank": 4,
                "name": "<mon4>",
                "addr": "[fdc4:8570:e14c:132d::151]:6789\/0"
            }
        ]
    }
}


Is there any way to force the cluster back into a quorum even if it's just one mon running to start it up? I've tried exporting the mgmt's monmap and injecting it into the other nodes, but it didn't make any difference.

Thanks!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux