Wouldn’t this be the same operation as growing the number of monitors from let’s say 3 to 5 in an already running, production cluster, which AFAIK is supported? Just in this case it’s not 3->5 but 1->X :) > On 20 Jan 2016, at 22:04, Wido den Hollander <wido@xxxxxxxx> wrote: > > On 01/20/2016 08:01 PM, Zoltan Arnold Nagy wrote: >> Wouldn’t actually blowing away the other monitors then recreating them >> from scratch solve the issue? >> >> Never done this, just thinking out loud. It would grab the osdmap and >> everything from the other monitor and form a quorum, wouldn’t it? >> > > Nope, those monitors will not have any historical OSDMaps which will be > required by OSDs which need to catch up with the cluster. > > It might be possible technically by hacking a lot of stuff, but that > won't be easy. > > I'm still busy with this btw. The monitors are in a electing state since > 2 monitors are still synchronizing and one won't boot anymore :( > >>> On 20 Jan 2016, at 16:26, Wido den Hollander <wido@xxxxxxxx >>> <mailto:wido@xxxxxxxx>> wrote: >>> >>> On 01/20/2016 04:22 PM, Zoltan Arnold Nagy wrote: >>>> Hi Wido, >>>> >>>> So one out of the 5 monitors are running fine then? Did that have >>>> more space for it’s leveldb? >>>> >>> >>> Yes. That was at 99% full and by cleaning some stuff in /var/cache and >>> /var/log I was able to start it. >>> >>> It compacted the levelDB database and is now on 1% disk usage. >>> >>> Looking at the ceph_mon.cc code: >>> >>> if (stats.avail_percent <= g_conf->mon_data_avail_crit) { >>> >>> Setting mon_data_avail_crit to 0 does not work since 100% full is equal >>> to 0% free.. >>> >>> There is ~300M free on the other 4 monitors. I just can't start the mon >>> and tell it to compact. >>> >>> Lessons learned here though, always make sure you have some additional >>> space you can clear when you need it. >>> >>>>> On 20 Jan 2016, at 16:15, Wido den Hollander <wido@xxxxxxxx >>>>> <mailto:wido@xxxxxxxx>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> I have an issue with a (not in production!) Ceph cluster which I'm >>>>> trying to resolve. >>>>> >>>>> On Friday the network links between the racks failed and this caused all >>>>> monitors to loose connection. >>>>> >>>>> Their leveldb stores kept growing and they are currently 100% full. They >>>>> all have a few hunderd MB left. >>>>> >>>>> Starting the 'compact on start' doesn't work since the FS is 100% >>>>> full.error: monitor data filesystem reached concerning levels of >>>>> available storage space (available: 0% 238 MB) >>>>> you may adjust 'mon data avail crit' to a lower value to make this go >>>>> away (default: 0%) >>>>> >>>>> On of the 5 monitors is now running but that's not enough. >>>>> >>>>> Any ideas how to compact this leveldb? I can't free up any more space >>>>> right now on these systems. Getting bigger disks in is also going to >>>>> take a lot of time. >>>>> >>>>> Any tools outside the monitors to use here? >>>>> >>>>> Keep in mind, this is a pre-production cluster. We would like to keep >>>>> the cluster and fix this as a good exercise of stuff which could go >>>>> wrong. Dangerous tools are allowed! >>>>> >>>>> -- >>>>> Wido den Hollander >>>>> 42on B.V. >>>>> Ceph trainer and consultant >>>>> >>>>> Phone: +31 (0)20 700 9902 >>>>> Skype: contact42on >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> >>> >>> >>> -- >>> Wido den Hollander >>> 42on B.V. >>> Ceph trainer and consultant >>> >>> Phone: +31 (0)20 700 9902 >>> Skype: contact42on >> > > > -- > Wido den Hollander > 42on B.V. > Ceph trainer and consultant > > Phone: +31 (0)20 700 9902 > Skype: contact42on > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com