On Wed, Jan 20, 2016 at 8:01 PM, Zoltan Arnold Nagy <zoltan@xxxxxxxxxxxxxxxxxx> wrote: > > Wouldn’t actually blowing away the other monitors then recreating them from scratch solve the issue? > > Never done this, just thinking out loud. It would grab the osdmap and everything from the other monitor and form a quorum, wouldn’t it? > Recreating monitors works as long as the others can form a quorum. I've done this many many times. In Wido's case he might have been able to solve this by rm'ing the broken mon's from the cluster until the one remaining formed a quorum with it self, then slowly add the other mon's back. -- dan > > On 20 Jan 2016, at 16:26, Wido den Hollander <wido@xxxxxxxx> wrote: > > On 01/20/2016 04:22 PM, Zoltan Arnold Nagy wrote: > > Hi Wido, > > So one out of the 5 monitors are running fine then? Did that have more space for it’s leveldb? > > > Yes. That was at 99% full and by cleaning some stuff in /var/cache and > /var/log I was able to start it. > > It compacted the levelDB database and is now on 1% disk usage. > > Looking at the ceph_mon.cc code: > > if (stats.avail_percent <= g_conf->mon_data_avail_crit) { > > Setting mon_data_avail_crit to 0 does not work since 100% full is equal > to 0% free.. > > There is ~300M free on the other 4 monitors. I just can't start the mon > and tell it to compact. > > Lessons learned here though, always make sure you have some additional > space you can clear when you need it. > > On 20 Jan 2016, at 16:15, Wido den Hollander <wido@xxxxxxxx> wrote: > > Hello, > > I have an issue with a (not in production!) Ceph cluster which I'm > trying to resolve. > > On Friday the network links between the racks failed and this caused all > monitors to loose connection. > > Their leveldb stores kept growing and they are currently 100% full. They > all have a few hunderd MB left. > > Starting the 'compact on start' doesn't work since the FS is 100% > full.error: monitor data filesystem reached concerning levels of > available storage space (available: 0% 238 MB) > you may adjust 'mon data avail crit' to a lower value to make this go > away (default: 0%) > > On of the 5 monitors is now running but that's not enough. > > Any ideas how to compact this leveldb? I can't free up any more space > right now on these systems. Getting bigger disks in is also going to > take a lot of time. > > Any tools outside the monitors to use here? > > Keep in mind, this is a pre-production cluster. We would like to keep > the cluster and fix this as a good exercise of stuff which could go > wrong. Dangerous tools are allowed! > > -- > Wido den Hollander > 42on B.V. > Ceph trainer and consultant > > Phone: +31 (0)20 700 9902 > Skype: contact42on > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Wido den Hollander > 42on B.V. > Ceph trainer and consultant > > Phone: +31 (0)20 700 9902 > Skype: contact42on > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com