Re: Ceph monitors 100% full filesystem, refusing start

Zoltan Arnold Nagy <zoltan@xxxxxxxxxxxxxxxxxx> · Wed, 20 Jan 2016 23:29:18 +0100

Wouldn’t this be the same operation as growing the number of monitors from let’s say 3 to 5 in an already running, production cluster, which AFAIK is supported?

Just in this case it’s not 3->5 but 1->X :)

> On 20 Jan 2016, at 22:04, Wido den Hollander <wido@xxxxxxxx> wrote:
> 
> On 01/20/2016 08:01 PM, Zoltan Arnold Nagy wrote:
>> Wouldn’t actually blowing away the other monitors then recreating them
>> from scratch solve the issue?
>> 
>> Never done this, just thinking out loud. It would grab the osdmap and
>> everything from the other monitor and form a quorum, wouldn’t it?
>> 
> 
> Nope, those monitors will not have any historical OSDMaps which will be
> required by OSDs which need to catch up with the cluster.
> 
> It might be possible technically by hacking a lot of stuff, but that
> won't be easy.
> 
> I'm still busy with this btw. The monitors are in a electing state since
> 2 monitors are still synchronizing and one won't boot anymore :(
> 
>>> On 20 Jan 2016, at 16:26, Wido den Hollander <wido@xxxxxxxx
>>> <mailto:wido@xxxxxxxx>> wrote:
>>> 
>>> On 01/20/2016 04:22 PM, Zoltan Arnold Nagy wrote:
>>>> Hi Wido,
>>>> 
>>>> So one out of the 5 monitors are running fine then? Did that have
>>>> more space for it’s leveldb?
>>>> 
>>> 
>>> Yes. That was at 99% full and by cleaning some stuff in /var/cache and
>>> /var/log I was able to start it.
>>> 
>>> It compacted the levelDB database and is now on 1% disk usage.
>>> 
>>> Looking at the ceph_mon.cc code:
>>> 
>>> if (stats.avail_percent <= g_conf->mon_data_avail_crit) {
>>> 
>>> Setting mon_data_avail_crit to 0 does not work since 100% full is equal
>>> to 0% free..
>>> 
>>> There is ~300M free on the other 4 monitors. I just can't start the mon
>>> and tell it to compact.
>>> 
>>> Lessons learned here though, always make sure you have some additional
>>> space you can clear when you need it.
>>> 
>>>>> On 20 Jan 2016, at 16:15, Wido den Hollander <wido@xxxxxxxx
>>>>> <mailto:wido@xxxxxxxx>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I have an issue with a (not in production!) Ceph cluster which I'm
>>>>> trying to resolve.
>>>>> 
>>>>> On Friday the network links between the racks failed and this caused all
>>>>> monitors to loose connection.
>>>>> 
>>>>> Their leveldb stores kept growing and they are currently 100% full. They
>>>>> all have a few hunderd MB left.
>>>>> 
>>>>> Starting the 'compact on start' doesn't work since the FS is 100%
>>>>> full.error: monitor data filesystem reached concerning levels of
>>>>> available storage space (available: 0% 238 MB)
>>>>> you may adjust 'mon data avail crit' to a lower value to make this go
>>>>> away (default: 0%)
>>>>> 
>>>>> On of the 5 monitors is now running but that's not enough.
>>>>> 
>>>>> Any ideas how to compact this leveldb? I can't free up any more space
>>>>> right now on these systems. Getting bigger disks in is also going to
>>>>> take a lot of time.
>>>>> 
>>>>> Any tools outside the monitors to use here?
>>>>> 
>>>>> Keep in mind, this is a pre-production cluster. We would like to keep
>>>>> the cluster and fix this as a good exercise of stuff which could go
>>>>> wrong. Dangerous tools are allowed!
>>>>> 
>>>>> -- 
>>>>> Wido den Hollander
>>>>> 42on B.V.
>>>>> Ceph trainer and consultant
>>>>> 
>>>>> Phone: +31 (0)20 700 9902
>>>>> Skype: contact42on
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> Wido den Hollander
>>> 42on B.V.
>>> Ceph trainer and consultant
>>> 
>>> Phone: +31 (0)20 700 9902
>>> Skype: contact42on
>> 
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com