Re: Disaster recovery of monitor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Tue, Nov 17, 2015 at 7:27 AM, Joao Eduardo Luis <joao@xxxxxxx> wrote:
On 11/17/2015 03:56 AM, Jose Tavares wrote:
> The problem is that I think I don't have any good monitor anymore.
> How do I know if the map I am trying is ok?
>
> I also saw in the logs that the primary mon was trying to contact a
> removed mon at IP .112 .. So, I added .112 again ... and it didn't help.
>
> Attached are the logs of what is going on and some monmaps that I
> capture that were from minutes before the cluster become inaccessible ..
>
> Should I try inject this monmaps in my primary mon to see if it can
> recover the cluster?
> Is it possible to see if this monmaps match my content?

Without access to the actual store.db there's no way to ascertain if the
store has any problems, and even then figuring out a potential
corruption from just one monitor store.db would either be impossible or
impractical.

I posted my store.db in my previous answer ..

 

That said, from the log you attached it seems you only have issues with
authentication: you have pgmaps from epoch 91923 through to 92589, you
have an mds map (epoch 38), osdmaps at least through epoch 307, and 40
versions for the auth keys.

Somehow, though, your monitors are unable to authenticate each other. No
way to tell if that was corruption or user error.

You should be able to get your monitors back to speaking terms again
simply by disabling cephx temporarily. Then you can figure out whatever
you need to figure out in terms of monitor keys.

Just update your ceph.conf with 'auth supported = none' and restart the
monitors. See how it goes from there.

I tried your suggestion and it didn't make any change to the results .. :(

Thanks a lot.
Jose Tavares 

 
HTH

  -Joao



>
> Thanks a lot.
> Jose Tavares
>
>
>
>
>
> On Mon, Nov 16, 2015 at 10:48 PM, Nathan Harper
> <nathan.harper@xxxxxxxxxxx <mailto:nathan.harper@xxxxxxxxxxx>> wrote:
>
>     I had to go through a similar process when we had a disaster which
>     destroyed one of our monitors.   I followed the process here:
>     REMOVING MONITORS FROM AN UNHEALTHY CLUSTER
>     <http://docs.ceph.com/docs/hammer/rados/operations/add-or-rm-mons/> to
>     remove all but one monitor, which let me bring the cluster back up.
>
>     As you are running an older version of Ceph than hammer, some of the
>     commands might differ (perhaps this might
>     help http://docs.ceph.com/docs/v0.80/rados/operations/add-or-rm-mons/)
>
>
>     --
>     *Nathan Harper*// IT Systems Architect
>
>     *e: * nathan.harper@xxxxxxxxxxx <mailto:nathan.harper@xxxxxxxxxxx>
>     // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: *
>     www.cfms.org.uk <http://www.cfms.org.uk%22> // Linkedin grey icon
>     scaled <http://uk.linkedin.com/pub/nathan-harper/21/696/b81>
>     CFMS Services Ltd// Bristol & Bath Science Park // Dirac Crescent //
>     Emersons Green // Bristol // BS16 7FR
>
>     CFMS Services Ltd is registered in England and Wales No 05742022 - a
>     subsidiary of CFMS Ltd
>     CFMS Services Ltd registered office // Victoria House // 51 Victoria
>     Street // Bristol // BS1 6AD
>
>     On 16 November 2015 at 16:50, Jose Tavares <jat@xxxxxxxxxxxx
>     <mailto:jat@xxxxxxxxxxxx>> wrote:
>
>         Hi guys ...
>         I need some help as my cluster seems to be corrupted.
>
>         I saw here ..
>         https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg01919.html
>         .. a msg from 2013 where Peter had a problem with his monitors.
>
>         I had the same problem today when trying to add a new monitor,
>         and than playing with monmap as the monitors were not entering
>         the quorum. I'm using version 0.80.8.
>
>         Right now my cluster won't start because of a corrupted monitor.
>         Is it possible to remove all monitors and create just a new one
>         without losing data? I have ~260GB of data with work from 2 weeks.
>
>         What should I do? Do you recommend any specific procedure?
>
>         Thanks a lot.
>         Jose Tavares
>
>         _______________________________________________
>         ceph-users mailing list
>         ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux