Re: ceph-mon store.db disk usage increase on OSD-Host fail

Wido den Hollander <wido@xxxxxxxx> · Tue, 10 Mar 2020 10:59:25 +0100

On 3/10/20 10:48 AM, Hartwig Hauschild wrote:
> Hi, 
> 
> I've done a bit more testing ...
> 
> Am 05.03.2020 schrieb Hartwig Hauschild:
>> Hi, 
>>
>> I'm (still) testing upgrading from Luminous to Nautilus and ran into the
>> following situation:
>>
>> The lab-setup I'm testing in has three OSD-Hosts. 
>> If one of those hosts dies the store.db in /var/lib/ceph/mon/ on all my
>> Mon-Nodes starts to rapidly grow in size until either the OSD-host comes
>> back up or disks are full.
>>
> This also happens when I take one single OSD offline - /var/lib/ceph/mon/
> grows from around 100MB to ~2GB in about 5 Minutes, then I aborted the test.
> Since we've had an OSD-Host fail over a weekend I know that growing won't
> stop until the disk is full and that usually happens in around 20 Minutes,
> then taking up 17GB of diskspace.
> 
>> On another cluster that's still on Luminous I don't see any growth at all.
>>
> Retested that cluster as well, observing the size on disk of
> /var/lib/ceph/mon/ suggests, that there's writes and deletes / compactions
> going on as it kept floating within +- 5% of the original size.
> 
>> Is that a difference in behaviour between Luminous and Nautilus or is that
>> caused by the lab-setup only having three hosts and one lost host causing
>> all PGs to be degraded at the same time?
>>
> 
> I've read somewhere in the docs that I should provide ample space (tens of
> GB) for the store.db, found on the ML and Bugtracker that ~100GB might not
> be a bad idea and that large clusters may require space on order of
> magnitude greater.
> Is there some sort of formula I can use to approximate the space required?

I don't know about a formula, but make sure you have enough space. MONs
are dedicated nodes in most production environments, so I usually
install a 400 ~ 1000GB SSD just to make sure they don't run out of space.

> 
> Also: is the db supposed to grow this fast in Nautilus when it did not do
> that in Luminous? Is that behaviour configurable somewhere?
> 

The MONs need to cache the OSDMaps when not all PGs are active+clean
thus their database grows.

You can compact RocksDB in the meantime, but it won't last for ever.

Just make sure the MONs have enough space.

Wido

> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx