Re: After reboot nothing worked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17/12/13 13:36, Umar Draz wrote:
Hi Joao,

Thanks for this valuable information. Ok another problem, I want to
remove the mon host from the cluster here is my mon dump output

root@vms2:~# ceph mon dump
dumped monmap epoch 1
epoch 1
fsid 6ce085b5-1747-46f6-9fda-a3f1e8c75beb
last_changed 0.000000
created 0.000000
0: 192.168.1.128:6789/0 <http://192.168.1.128:6789/0> mon.vms1
1: 192.168.1.129:6789/0 <http://192.168.1.129:6789/0> mon.vms2

I tried to remove the the mon.vms2 from the cluster following this
document http://ceph.com/docs/master/rados/operations/add-or-rm-mons/

but again its not worked.

root@vms2:~# service ceph -a stop mon.vms2
/etc/init.d/ceph: mon.vms2 not found (/etc/ceph/ceph.conf defines ,
/var/lib/ceph defines )

root@vms2:/etc/ceph# ceph mon remove mon.vms2
mon mon.vms2 does not exist or has already been removed


Once you stop the monitor it stops being reachable by the last remaining monitor, thus the cluster loses quorum, therefore you are unable to talk to the cluster.

I can see how you bumped into that however. The docs should have disclaimed that the presented order would only ever work on a cluster with 3+ monitors.

Try removing the monitor first and then stopping it. Considering you have it on upstart, I would suggest you'd first remove ceph-mon from upstart to make sure it is not restarted straight away -- which I believe will lead to http://tracker.ceph.com/issues/6789

So, this would be the order I'd try:

1. remove ceph-mon from upstart or whatever to avoid it being restarted once it kills itself on the next step
2. 'ceph mon remove mon.vms2'
3. make sure ceph-mon with id mon.vms2 is not running
4. run 'ceph -s' to make sure it works and 'mon.vms2' is not in the quorum.

  -Joao


Br.

Umar



On Tue, Dec 17, 2013 at 6:18 PM, Karan Singh <ksingh@xxxxxx
<mailto:ksingh@xxxxxx>> wrote:

    Thanks Joao for information.

    Many Thanks
    Karan Singh


    ----- Original Message -----
    From: "Joao Eduardo Luis" <joao.luis@xxxxxxxxxxx
    <mailto:joao.luis@xxxxxxxxxxx>>
    To: ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    Sent: Tuesday, 17 December, 2013 2:56:23 PM
    Subject: Re:  After reboot nothing worked

    On 12/17/2013 09:54 AM, Karan Singh wrote:
     > Umar
     >
     > *Ceph is stable for production* , there are a large number of ceph
     > clusters deployed and running smoothly in PRODUCTIONS and
    countless in
     > testing / pre-production.
     >
     > Since you are facing problems with your ceph testing , it does
    not mean
     > CEPH is unstable.
     >
     > I would suggest put some time troubleshooting your problem.
     >
     > What i see from your logs  --
     >
     >   1) you have 2 Mons thats a problem ( either have 1  or have 3
    to form
     > quorum ) . Add 1 more monitor node

    Just to clarify this point a bit, one doesn't need an odd number of
    monitors in a ceph cluster to reach quorum.  This is a common
    misconception.

    The requirement to reach quorum is simply to have a majority of monitors
    able to talk to each other.  If one has 2 monitors and both are able to
    talk to each other they'll be able to form a quorum.

    Odd-numbers are advised however because one can tolerate as much
    failures with less infrastructure. E.g.,

    - for n = 1, failure of 1 monitor means loss of quorum
    - for n = 2, failure of 1 monitor means loss of quorum
    - for n = 3, failure of 1 monitor is okay; failure of 2 monitors means
    loss of quorum
    - for n = 4, failure of 1 monitor is okay; failure of 2 monitors means
    loss of quorum
    - for n = 5, failure of 2 monitors is okay; failure of 3 monitors means
    loss of quorum
    - for n = 6, failure of 2 monitors is okay; failure of 3 monitors means
    loss of quorum

    etc.

    So you can see how you don't get any benefits, from an availability
    perspective, by having either 2, 4 or 6 monitors when compared to having
    1, 3, 5.  If your target however is replication, then 2 is better
    than 1.

        -Joao



    --
    Joao Eduardo Luis
    Software Engineer | http://inktank.com | http://ceph.com
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Umar Draz
Network Architect


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux