Re: How can we repair OSD leveldb?

Christian Balzer <chibi@xxxxxxx> · Thu, 18 Aug 2016 13:14:47 +0900

Hello,

On Wed, 17 Aug 2016 16:54:41 -0500 Dan Jakubiec wrote:

> Hi Wido,
> 
> Thank you for the response:
> 
> > On Aug 17, 2016, at 16:25, Wido den Hollander <wido@xxxxxxxx> wrote:
> > 
> > 
> >> Op 17 augustus 2016 om 17:44 schreef Dan Jakubiec <dan.jakubiec@xxxxxxxxx>:
> >> 
> >> 
> >> Hello, we have a Ceph cluster with 8 OSD that recently lost power to all 8 machines.  We've managed to recover the XFS filesystems on 7 of the machines, but the OSD service is only starting on 1 of them.
> >> 
> >> The other 5 machines all have complaints similar to the following:
> >> 
> >> 	2016-08-17 09:32:15.549588 7fa2f4666800 -1 filestore(/var/lib/ceph/osd/ceph-1) Error initializing leveldb : Corruption: 6 missing files; e.g.: /var/lib/ceph/osd/ceph-1/current/omap/042421.ldb
> >>

That looks bad. 
And as Wido said, this shouldn't happen.
What are your XFS mount options for that FS?
I tend to remember seeing "nobarrier" in many OSD examples...

> >> How can we repair the leveldb to allow the OSDs to startup?  

Hopefully somebody with a leveldb clue will pipe up, but I have grave
doubts.

> >> 
> > 
> > My first question would be: How did this happen?
> > 
> > What hardware are you using underneath? Is there a RAID controller which is not flushing properly? Since this should not happen during a power failure.
> > 
> 
> Each OSD drive is connected to an onboard hardware RAID controller and configured in RAID 0 mode as individual virtual disks.  The RAID controller is an LSI 3108.
> 
What are the configuration options?
If there is no BBU and the controller is forcibly set to writeback
caching, this would explain it, too.

> I agree -- I am finding it bizarre that 7 of our 8 OSDs (one per machine) did not survive the power outage.  
>
My philosophy on this is that if any of DCs we're in should suffer a total
and abrupt power loss I won't care, as I'll be buried below tons of
concrete (this being Tokyo).

In a place were power outages are more likely, I'd put local APU in front
of stuff and issues a remote shutdown from it when it starts to run out
of juice.

Having a HW/SW combo that can survive a sudden power loss is nice, having
something in place that softly shuts down things before that is a lot
better.

> We did have some problems with the stock Ubunut xfs_repair (3.1.9) seg faulting, which eventually we overcame by building a newer version of xfs_repair (4.7.0).  But it did finally repair clean.
> 
That also doesn't instill me with confidence, both Ubuntu and XFS wise.

> We actually have some different errors on other OSDs.  A few of them are failing with "Missing map in load_pgs" errors.  But generally speaking it appears to be missing files of various types causing different kinds of failures.
> 
> I'm really nervous now about the OSD's inability to start with any inconsistencies and no repair utilities (that I can find).  Any advice on how to recover?
>
What I've seen in the past assumes that you have at least a running
cluster of sorts, just trashed PGs. 
This is far worse.

Christian 
> > I don't know the answer to your question, but lost files are not good.
> > 
> > You might find them in a lost+found directory if XFS repair worked?
> > 
> 
> Sadly this directory is empty.
> 
> -- Dan
> 
> > Wido
> > 
> >> Thanks,
> >> 
> >> -- Dan J_______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com