On 01/20/2016 08:01 PM, Zoltan Arnold Nagy wrote: > Wouldn’t actually blowing away the other monitors then recreating them > from scratch solve the issue? > > Never done this, just thinking out loud. It would grab the osdmap and > everything from the other monitor and form a quorum, wouldn’t it? > Nope, those monitors will not have any historical OSDMaps which will be required by OSDs which need to catch up with the cluster. It might be possible technically by hacking a lot of stuff, but that won't be easy. I'm still busy with this btw. The monitors are in a electing state since 2 monitors are still synchronizing and one won't boot anymore :( >> On 20 Jan 2016, at 16:26, Wido den Hollander <wido@xxxxxxxx >> <mailto:wido@xxxxxxxx>> wrote: >> >> On 01/20/2016 04:22 PM, Zoltan Arnold Nagy wrote: >>> Hi Wido, >>> >>> So one out of the 5 monitors are running fine then? Did that have >>> more space for it’s leveldb? >>> >> >> Yes. That was at 99% full and by cleaning some stuff in /var/cache and >> /var/log I was able to start it. >> >> It compacted the levelDB database and is now on 1% disk usage. >> >> Looking at the ceph_mon.cc code: >> >> if (stats.avail_percent <= g_conf->mon_data_avail_crit) { >> >> Setting mon_data_avail_crit to 0 does not work since 100% full is equal >> to 0% free.. >> >> There is ~300M free on the other 4 monitors. I just can't start the mon >> and tell it to compact. >> >> Lessons learned here though, always make sure you have some additional >> space you can clear when you need it. >> >>>> On 20 Jan 2016, at 16:15, Wido den Hollander <wido@xxxxxxxx >>>> <mailto:wido@xxxxxxxx>> wrote: >>>> >>>> Hello, >>>> >>>> I have an issue with a (not in production!) Ceph cluster which I'm >>>> trying to resolve. >>>> >>>> On Friday the network links between the racks failed and this caused all >>>> monitors to loose connection. >>>> >>>> Their leveldb stores kept growing and they are currently 100% full. They >>>> all have a few hunderd MB left. >>>> >>>> Starting the 'compact on start' doesn't work since the FS is 100% >>>> full.error: monitor data filesystem reached concerning levels of >>>> available storage space (available: 0% 238 MB) >>>> you may adjust 'mon data avail crit' to a lower value to make this go >>>> away (default: 0%) >>>> >>>> On of the 5 monitors is now running but that's not enough. >>>> >>>> Any ideas how to compact this leveldb? I can't free up any more space >>>> right now on these systems. Getting bigger disks in is also going to >>>> take a lot of time. >>>> >>>> Any tools outside the monitors to use here? >>>> >>>> Keep in mind, this is a pre-production cluster. We would like to keep >>>> the cluster and fix this as a good exercise of stuff which could go >>>> wrong. Dangerous tools are allowed! >>>> >>>> -- >>>> Wido den Hollander >>>> 42on B.V. >>>> Ceph trainer and consultant >>>> >>>> Phone: +31 (0)20 700 9902 >>>> Skype: contact42on >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >> >> >> -- >> Wido den Hollander >> 42on B.V. >> Ceph trainer and consultant >> >> Phone: +31 (0)20 700 9902 >> Skype: contact42on > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com