Re: Ceph monitors 100% full filesystem, refusing start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/20/2016 04:25 PM, Joao Eduardo Luis wrote:
> On 01/20/2016 03:15 PM, Wido den Hollander wrote:
>> Hello,
>>
>> I have an issue with a (not in production!) Ceph cluster which I'm
>> trying to resolve.
>>
>> On Friday the network links between the racks failed and this caused all
>> monitors to loose connection.
>>
>> Their leveldb stores kept growing and they are currently 100% full. They
>> all have a few hunderd MB left.
> 
> I'm incredibly curious to know what was written to leveldb to bring it
> to grow unbounded. Did the monitors hold quorum? I'm guessing that would
> be a 'no', given the network failure you mentioned, hence my morbid
> curiosity in figuring out what happened there.
> 

Yes, quorum got lost. Monitors are in different racks and the core
switching failed. Since it was pre-production people didn't notice until
Tuesday.

> If you don't mind, running a 'ceph-kvstore-tool /path/to/store.db
> leveldb list > /tmp/store.dump' could, maybe, shed some light on this
> issue (at least it will dump all the keys, and maybe something will be
> obvious, don't know). I'd certainly be interested in taking a look at
> those stores if you don't mind ;)
> 

This is a 1800 OSD cluster and a ceph-kvstore-tool <path> list shows me
a lot, but I mean, a lot of osdmaps.

I think that stuff failed horribly due to the network flapping.

Running just the list already compacted leveldb btw. I have free space
again and the monitors are starting. Waiting for them to form a quorum
again.

>> Starting the 'compact on start' doesn't work since the FS is 100%
>> full.error: monitor data filesystem reached concerning levels of
>> available storage space (available: 0% 238 MB)
>> you may adjust 'mon data avail crit' to a lower value to make this go
>> away (default: 0%)
>>
>> On of the 5 monitors is now running but that's not enough.
>>
>> Any ideas how to compact this leveldb? I can't free up any more space
>> right now on these systems. Getting bigger disks in is also going to
>> take a lot of time.
> 
> Running 'ceph-kvstore-tool' may also force leveldb to compact on open,
> so you may have a shot there at compaction. If that doesn't work,
> 'ceph-monstore-tool' has a 'compact' command -- that should help you
> sort it out.
> 
>   -Joao
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux