Re: Ceph monitors 100% full filesystem, refusing start

Nick Fisk <nick@xxxxxxxxxx> · Wed, 20 Jan 2016 15:33:56 -0000

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Wido den Hollander
> Sent: 20 January 2016 15:27
> To: Zoltan Arnold Nagy <zoltan@xxxxxxxxxxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Ceph monitors 100% full filesystem, refusing start
> 
> On 01/20/2016 04:22 PM, Zoltan Arnold Nagy wrote:
> > Hi Wido,
> >
> > So one out of the 5 monitors are running fine then? Did that have more
> space for it’s leveldb?
> >
> 
> Yes. That was at 99% full and by cleaning some stuff in /var/cache and
> /var/log I was able to start it.
> 
> It compacted the levelDB database and is now on 1% disk usage.
> 
> Looking at the ceph_mon.cc code:
> 
> if (stats.avail_percent <= g_conf->mon_data_avail_crit) {
> 
> Setting mon_data_avail_crit to 0 does not work since 100% full is equal to 0%
> free..
> 
> There is ~300M free on the other 4 monitors. I just can't start the mon and
> tell it to compact.
> 
> Lessons learned here though, always make sure you have some additional
> space you can clear when you need it.

Slightly unrelated, but before the arrival of virtualisation,  when I used to manage MS Exchange servers we always used to copy a DVD ISO onto the DB/Logs disk, so that in the event of a disk full scenario we could always instantly free up 4GB of space. Maybe something along those lines (dd /dev/zero to a file) would be good practice.

> 
> >> On 20 Jan 2016, at 16:15, Wido den Hollander <wido@xxxxxxxx> wrote:
> >>
> >> Hello,
> >>
> >> I have an issue with a (not in production!) Ceph cluster which I'm
> >> trying to resolve.
> >>
> >> On Friday the network links between the racks failed and this caused
> >> all monitors to loose connection.
> >>
> >> Their leveldb stores kept growing and they are currently 100% full.
> >> They all have a few hunderd MB left.
> >>
> >> Starting the 'compact on start' doesn't work since the FS is 100%
> >> full.error: monitor data filesystem reached concerning levels of
> >> available storage space (available: 0% 238 MB) you may adjust 'mon
> >> data avail crit' to a lower value to make this go away (default: 0%)
> >>
> >> On of the 5 monitors is now running but that's not enough.
> >>
> >> Any ideas how to compact this leveldb? I can't free up any more space
> >> right now on these systems. Getting bigger disks in is also going to
> >> take a lot of time.
> >>
> >> Any tools outside the monitors to use here?
> >>
> >> Keep in mind, this is a pre-production cluster. We would like to keep
> >> the cluster and fix this as a good exercise of stuff which could go
> >> wrong. Dangerous tools are allowed!
> >>
> >> --
> >> Wido den Hollander
> >> 42on B.V.
> >> Ceph trainer and consultant
> >>
> >> Phone: +31 (0)20 700 9902
> >> Skype: contact42on
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> 
> 
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com