Re: Fwd: lost power. monitors died. Cephx errors now

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Wido, 

Thanks for the advice.  While the data center has a/b circuits and redundant power, etc if a ground fault happens it  travels outside and fails causing the whole building to fail (apparently).

The monitors are each the same with
2x e5 cpus
64gb of ram
4x 300gb 10k SAS drives in raid 10 (write through mode). 
Ubuntu 14.04 with the latest updates prior to power failure (2016/Aug/10 - 3am CST)
Ceph hammer LTS 0.94.7

(we are still working on our jewel test cluster so it is planned but not in place yet)

The only thing that seems to be corrupt is the monitors leveldb store.  I see multiple issues on Google leveldb github from March 2016 about fsync and power failure so I assume this is an issue with leveldb. 

I have backed up /var/lib/ceph/Mon on all of my monitors before trying to proceed with any form of recovery.

Is there any way to reconstruct the leveldb or replace the monitors and recover the data?

I found the following post in which sage says it is tedious but possible. ( http://www.spinics.net/lists/ceph-devel/msg06662.html). Tedious is fine if I have any chance of doing it.  I have the fsid, the Mon key map and all of the osds look to be fine so all of the previous osd maps  are there.

I just don't understand what key/values I need inside. 


On Aug 11, 2016 1:33 AM, "Wido den Hollander" <wido@xxxxxxxx> wrote:

> Op 11 augustus 2016 om 0:10 schreef Sean Sullivan <seapasulli@xxxxxxxxxxxx>:
>
>
> I think it just got worse::
>
> all three monitors on my other cluster say that ceph-mon can't open
> /var/lib/ceph/mon/$(hostname). Is there any way to recover if you lose all
> 3 monitors? I saw a post by Sage saying that the data can be recovered as
> all of the data is held on other servers. Is this possible? If so has
> anyone had any experience doing so?

I have never done so, so I couldn't tell you.

However, it is weird that on all three it got corrupted. What hardware are you using? Was it properly protected against power failure?

If you mon store is corrupted I'm not sure what might happen.

However, make a backup of ALL monitors right now before doing anything.

Wido

> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux