Re: Ceph monitors 100% full filesystem, refusing start

Zoltan Arnold Nagy <zoltan@xxxxxxxxxxxxxxxxxx> · Wed, 20 Jan 2016 20:01:20 +0100

Wouldn’t actually blowing away the other monitors then recreating them from scratch solve the issue?
Never done this, just thinking out loud. It would grab the osdmap and everything from the other monitor and form a quorum, wouldn’t it?

On 20 Jan 2016, at 16:26, Wido den Hollander <wido@xxxxxxxx> wrote:

On 01/20/2016 04:22 PM, Zoltan Arnold Nagy wrote:
Hi Wido,

So one out of the 5 monitors are running fine then? Did that have more space for it’s leveldb?

Yes. That was at 99% full and by cleaning some stuff in /var/cache and
/var/log I was able to start it.

It compacted the levelDB database and is now on 1% disk usage.

Looking at the ceph_mon.cc code:

if (stats.avail_percent <= g_conf->mon_data_avail_crit) {

Setting mon_data_avail_crit to 0 does not work since 100% full is equal
to 0% free..

There is ~300M free on the other 4 monitors. I just can't start the mon
and tell it to compact.

Lessons learned here though, always make sure you have some additional
space you can clear when you need it.

On 20 Jan 2016, at 16:15, Wido den Hollander <wido@xxxxxxxx> wrote:

Hello,

I have an issue with a (not in production!) Ceph cluster which I'm
trying to resolve.

On Friday the network links between the racks failed and this caused all
monitors to loose connection.

Their leveldb stores kept growing and they are currently 100% full. They
all have a few hunderd MB left.

Starting the 'compact on start' doesn't work since the FS is 100%
full.error: monitor data filesystem reached concerning levels of
available storage space (available: 0% 238 MB)
you may adjust 'mon data avail crit' to a lower value to make this go
away (default: 0%)

On of the 5 monitors is now running but that's not enough.

Any ideas how to compact this leveldb? I can't free up any more space
right now on these systems. Getting bigger disks in is also going to
take a lot of time.

Any tools outside the monitors to use here?

Keep in mind, this is a pre-production cluster. We would like to keep
the cluster and fix this as a good exercise of stuff which could go
wrong. Dangerous tools are allowed!

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com