Re: Ceph monitors 100% full filesystem, refusing start

Joao Eduardo Luis <joao@xxxxxxx> · Wed, 20 Jan 2016 15:25:11 +0000

On 01/20/2016 03:15 PM, Wido den Hollander wrote:
> Hello,
> 
> I have an issue with a (not in production!) Ceph cluster which I'm
> trying to resolve.
> 
> On Friday the network links between the racks failed and this caused all
> monitors to loose connection.
> 
> Their leveldb stores kept growing and they are currently 100% full. They
> all have a few hunderd MB left.

I'm incredibly curious to know what was written to leveldb to bring it
to grow unbounded. Did the monitors hold quorum? I'm guessing that would
be a 'no', given the network failure you mentioned, hence my morbid
curiosity in figuring out what happened there.

If you don't mind, running a 'ceph-kvstore-tool /path/to/store.db
leveldb list > /tmp/store.dump' could, maybe, shed some light on this
issue (at least it will dump all the keys, and maybe something will be
obvious, don't know). I'd certainly be interested in taking a look at
those stores if you don't mind ;)

> Starting the 'compact on start' doesn't work since the FS is 100%
> full.error: monitor data filesystem reached concerning levels of
> available storage space (available: 0% 238 MB)
> you may adjust 'mon data avail crit' to a lower value to make this go
> away (default: 0%)
> 
> On of the 5 monitors is now running but that's not enough.
> 
> Any ideas how to compact this leveldb? I can't free up any more space
> right now on these systems. Getting bigger disks in is also going to
> take a lot of time.

Running 'ceph-kvstore-tool' may also force leveldb to compact on open,
so you may have a shot there at compaction. If that doesn't work,
'ceph-monstore-tool' has a 'compact' command -- that should help you
sort it out.

  -Joao
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com