Re: ceph-mon store.db disk usage increase on OSD-Host fail

Hartwig Hauschild <ml-ceph@xxxxxxxxxxxx> · Tue, 10 Mar 2020 10:48:02 +0100

Hi, 

I've done a bit more testing ...

Am 05.03.2020 schrieb Hartwig Hauschild:
> Hi, 
> 
> I'm (still) testing upgrading from Luminous to Nautilus and ran into the
> following situation:
> 
> The lab-setup I'm testing in has three OSD-Hosts. 
> If one of those hosts dies the store.db in /var/lib/ceph/mon/ on all my
> Mon-Nodes starts to rapidly grow in size until either the OSD-host comes
> back up or disks are full.
> 
This also happens when I take one single OSD offline - /var/lib/ceph/mon/
grows from around 100MB to ~2GB in about 5 Minutes, then I aborted the test.
Since we've had an OSD-Host fail over a weekend I know that growing won't
stop until the disk is full and that usually happens in around 20 Minutes,
then taking up 17GB of diskspace.

> On another cluster that's still on Luminous I don't see any growth at all.
> 
Retested that cluster as well, observing the size on disk of
/var/lib/ceph/mon/ suggests, that there's writes and deletes / compactions
going on as it kept floating within +- 5% of the original size.

> Is that a difference in behaviour between Luminous and Nautilus or is that
> caused by the lab-setup only having three hosts and one lost host causing
> all PGs to be degraded at the same time?
> 

I've read somewhere in the docs that I should provide ample space (tens of
GB) for the store.db, found on the ML and Bugtracker that ~100GB might not
be a bad idea and that large clusters may require space on order of
magnitude greater.
Is there some sort of formula I can use to approximate the space required?

Also: is the db supposed to grow this fast in Nautilus when it did not do
that in Luminous? Is that behaviour configurable somewhere?

-- 
Cheers,
	Hardy
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx