Re: mon db growing. over 500Gb

<ricardo.re.azevedo@xxxxxxxxx> · Wed, 10 Mar 2021 16:59:32 -0800

Thanks for the input Lincoln.

I think I am in a similar boat. I don't have the insight module activated. I
checked one of my troublesome monitors with the command you game and indeed
it is full of logm messages. I am not sure what would have caused it though.
My OSDs have been behaving relatively ok.

I tried rebooting all my OSDs with mds and mgr disabled and am in the same
spot. Starting the mon manually with mon_compact_on_start gives:

ceph@a /m/c/m/c/store.db> /usr/bin/ceph-mon -f --cluster ceph --id a
--setuser ceph --setgroup ceph

ignoring --setuser ceph since I am not root

ignoring --setgroup ceph since I am not root

2021-03-10T16:47:10.017-0800 7f076f143540 -1 compacting monitor store ...

It hangs on compaction meanwhile the store.db keeps expanding. Seems like
there is something wrong with compaction since I don't think the mon is
connected yet at this point and I have every other ceph service disabled.

I had to increase ulimit at this point.

any thoughts on how to procede? Is there a way I can clear the db of these
messages?

Thanks everyone

From: Lincoln Bryant <lincolnb@xxxxxxxxxxxx> 
Sent: Wednesday, March 10, 2021 4:06 PM
To: ricardo.re.azevedo@xxxxxxxxx; ceph-users@xxxxxxx
Subject: Re:  mon db growing. over 500Gb

Hi Ricardo,

I just had a similar issue recently. 

I did a dump of the monitor store (i.e., something like "ceph-monstore-tool
/var/lib/ceph/mon/mon-a/ dump-keys") and most messages were of type 'logm'.
For me I think it was a lot of log messages coming from an oddly behaving
OSD.

I've seen folks advise disabling the Ceph mgr insights module if you have it
running and there are degraded PGs, to see if that helps.

What finally solved it for me was doing a rolling restart of my nodes, but I
started from all PGs active+clean. 

--Lincoln

  _____  

From: ricardo.re.azevedo@xxxxxxxxx <mailto:ricardo.re.azevedo@xxxxxxxxx>
<ricardo.re.azevedo@xxxxxxxxx <mailto:ricardo.re.azevedo@xxxxxxxxx> >
Sent: Wednesday, March 10, 2021 5:59 PM
To: ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>  <ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> >
Subject:  mon db growing. over 500Gb 

Hi all,

I have a fairly pressing issue. I had a monitor fall out of quorum because
it ran out of disk space during rebalancing from switching to upmap. I
noticed all my monitor store.db started taking up nearly all disk space so I
set noout, nobackfill and norecover and shutdown all the monitor daemons.
Each store.db was at:

mon.a 89GB (the one that firt dropped out)

mon.a 400GB

mon.c 400GB

I tried setting mon_compact_on_start. This brought  mon.a down to 1GB. Cool.
However, when I try it on the other monitors it increased the db size
~1Gb/10s so I shut them down again.

Any idea what is going on? Or how can I shrik back down the db?

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> 
To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx