Re: help, add mon failed lead to cluster failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I experienced this from time to time with older releases of ceph, but haven't stumbled upon it for some time.

Often I had to revert to the older state by using: http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

and dump the monlist, find the original monitor - kill the newest addition, inject and restart it - then it should get online again.

Cheers,
Martin


On Wed, Mar 26, 2014 at 11:40 AM, <duan.xufeng@xxxxxxxxxx> wrote:

Hi,
        I just add a new mon to a health cluster by following website manual "http://ceph.com/docs/master/rados/operations/add-or-rm-mons/" "ADDING MONITORS" step by step,

but when i execute step 6:
ceph mon add <mon-id> <ip>[:<port>]

the command didn't return, then i execute "ceph -s" on health mon node, this command didn't return either.

so i try to restart mon to recover the whole cluster, but it seems never recover.

Please anyone tell me how to deal with it?


=== mon.storage1 ===
Starting Ceph mon.storage1 on storage1...
Starting ceph-create-keys on storage1...

[root@storage1 ~]# ceph -s   //after restart mon , "ceph -s" still have no output




[root@storage1 ceph]# tail ceph-mon.storage1.log
2014-03-26 18:20:33.338554 7f60dbb967a0  0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 24214
2014-03-26 18:20:33.460282 7f60dbb967a0  1 mon.storage1@-1(probing) e2 preinit fsid 3429fd17-4a92-4d3b-a7fa-04adedb0da82
2014-03-26 18:20:33.460694 7f60dbb967a0  1 mon.storage1@-1(probing).pg v0 on_upgrade discarding in-core PGMap
2014-03-26 18:20:33.487899 7f60dbb967a0  0 mon.storage1@-1(probing) e2  my rank is now 0 (was -1)
2014-03-26 18:20:33.488575 7f60d6854700  0 -- 193.168.1.100:6789/0 >> 193.168.1.133:6789/0 pipe(0x3f38280 sd=21 :0 s=1 pgs=0 cs=0 l=0 c=0x3f19600).fault
2014-03-26 18:21:33.487686 7f60d8657700  0 mon.storage1@0(probing).data_health(0) update_stats avail 86% total 51606140 used 4324004 avail 44660696
2014-03-26 18:22:33.488091 7f60d8657700  0 mon.storage1@0(probing).data_health(0) update_stats avail 86% total 51606140 used 4324004 avail 44660696
2014-03-26 18:23:33.488500 7f60d8657700  0 mon.storage1@0(probing).data_health(0) update_stats avail 86% total 51606140 used 4324004 avail 44660696

--------------------------------------------------------
ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s).  If you are not an intended recipient, any disclosure, reproduction, distribution or other dissemination or use of the information contained is strictly prohibited.  If you have received this mail in error, please delete it and notify us immediately.



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux