Re: mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

Joao Eduardo Luis <joao@xxxxxxx> · Wed, 1 Feb 2017 00:07:14 +0000

On 01/31/2017 07:12 PM, Shinobu Kinjo wrote:
On Wed, Feb 1, 2017 at 1:51 AM, Joao Eduardo Luis <joao@xxxxxxx> wrote:
On 01/31/2017 03:35 PM, David Turner wrote:

If you do have a large enough drive on all of your mons (and always
intend to do so) you can increase the mon store warning threshold in the
config file so that it no longer warns at 15360 MB.

And if you so decide to go that route, please be aware that the monitors are
known to misbehave if their store grows too much.

Would you please elaborate on what *misbehave* means? Do you have any
pointers to tell us more specifically?

In particular, when using leveldb, stalls while reading or writing to 
the store - typically, leveldb is compacting when this happens. This 
leads to all sorts of timeouts to be triggered, but the really annoying 
one would be the lease timeout, which tends to result in flapping quorum.

Also, being unable to sync monitors. Again, stalls on leveldb lead to 
timeouts being triggered and the sync to restart.

Once upon a time, this *may* have also translated into large memory 
consumption. A direct relation was never proved though, and behaviour 
went away as ceph became smarter, and libs were updated by distros.

  -Joao

Those warnings have been put in place to let the admin know that action may
be needed, hopefully in time to avoid abhorrent behaviour.

  -Joao

From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Wido
den Hollander [wido@xxxxxxxx]
Sent: Tuesday, January 31, 2017 2:35 AM
To: Martin Palma; CEPH list
Subject: Re:  mon.mon01 store is getting too big! 18119 MB

= 15360 MB -- 94% avail

Op 31 januari 2017 om 10:22 schreef Martin Palma <martin@xxxxxxxx>:

Hi all,

our cluster is currently performing a big expansion and is in recovery
mode (we doubled in size and osd# from 600 TB to 1,2 TB).

Yes, that is to be expected. When not all PGs are active+clean the MONs
will not trim their datastore.

Now we get the following message from our monitor nodes:

mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

Reading [0] it says that it is normal in a state of active data
rebalance and after it is finished it will be compacted.

Should we wait until the recovery is finished or should we perform
"ceph tell mon.{id} compact" now during recovery?

Mainly wait and make sure there is enough disk space. You can try a
compact, but that can take the mon offline temp.

Just make sure you have enough diskspace :)

Wido

Best,
Martin

[0] https://access.redhat.com/solutions/1982273
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com