Re: Incomplete MON removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/07/15 00:03, Steve Thompson wrote:
Ceph newbie here; ceph 0.94.2, CentOS 6.6 x86_64. Kernel 2.6.32.

Initial test cluster of five OSD nodes, 3 MON, 1 MDS. Working well. I
was testing the removal of two MONs, just to see how it works. The
second MON was stopped and removed: no problems. The third MON was
stopped and removed: apparently no problems, and ceph told me that only
one MON remained. However, a "ceph -s", along with many other commands,
now hang for 5 minutes and then give me an authentication timeout. On
the initial MON node, anderson, I get:

# ceph daemon mon.anderson mon_status
{
     "name": "anderson",
     "rank": 1,
     "state": "probing",
     "election_epoch": 0,
     "quorum": [],
     "outside_quorum": [
         "anderson"
     ],
     "extra_probe_peers": [],
     "sync_provider": [],
     "monmap": {
         "epoch": 4,
         "fsid": "b9aeb134-fe63-46b4-a939-152a6c188f6a",
         "modified": "2015-07-07 17:18:02.816853",
         "created": "0.000000",
         "mons": [
             {
                 "rank": 0,
                 "name": "benford",
                 "addr": "10.22.200.13:6789\/0"
             },
             {
                 "rank": 1,
                 "name": "anderson",
                 "addr": "10.22.200.16:6789\/0"
             }
         ]
     }
}

So, no quorum. Here benford is the third MON that was already removed.
This removal, which initially appeared to work, evidently did not
complete fully. I cannot start a MON on benford, however ("mon.benford
not present in monmap"). I cannot start the OSD's on any node.

How do I recover from this situation?

Steve

How did you go about removing the mons? In general see http://ceph.com/docs/master/rados/operations/add-or-rm-mons/ .

Irrespective of whether or not the two mons where removed correctly, I'm thinking that you need to add additional ones as you remove the old ones - otherwise you lose quorum and nothing works (i.e where you are now).

To recover from your current state I'm thinking you'll need to create and inject a new monmap into your remaining mon (covered in the above link). Also check your ceph.conf on all nodes and remove the non existent mons from there too!

Good luck

Mark

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux