Re: mon db growing. over 500Gb

Lincoln Bryant <lincolnb@xxxxxxxxxxxx> · Thu, 11 Mar 2021 03:33:27 +0000

You can try compacting with monstore tool instead of using mon-compact-on-start. I am not sure if it makes any difference.
________________________________
From: ricardo.re.azevedo@xxxxxxxxx <ricardo.re.azevedo@xxxxxxxxx>
Sent: Wednesday, March 10, 2021 6:59 PM
To: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: RE:  mon db growing. over 500Gb

Thanks for the input Lincoln.

I think I am in a similar boat. I don’t have the insight module activated. I checked one of my troublesome monitors with the command you game and indeed it is full of logm messages. I am not sure what would have caused it though. My OSDs have been behaving relatively ok.

I tried rebooting all my OSDs with mds and mgr disabled and am in the same spot. Starting the mon manually with mon_compact_on_start gives:

ceph@a /m/c/m/c/store.db> /usr/bin/ceph-mon -f --cluster ceph --id a --setuser ceph --setgroup ceph

ignoring --setuser ceph since I am not root

ignoring --setgroup ceph since I am not root

2021-03-10T16:47:10.017-0800 7f076f143540 -1 compacting monitor store ...

It hangs on compaction meanwhile the store.db keeps expanding. Seems like there is something wrong with compaction since I don’t think the mon is connected yet at this point and I have every other ceph service disabled.

I had to increase ulimit at this point.

any thoughts on how to procede? Is there a way I can clear the db of these messages?

Thanks everyone

From: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>
Sent: Wednesday, March 10, 2021 4:06 PM
To: ricardo.re.azevedo@xxxxxxxxx; ceph-users@xxxxxxx
Subject: Re:  mon db growing. over 500Gb

Hi Ricardo,

I just had a similar issue recently.

I did a dump of the monitor store (i.e., something like "ceph-monstore-tool /var/lib/ceph/mon/mon-a/ dump-keys") and most messages were of type 'logm'. For me I think it was a lot of log messages coming from an oddly behaving OSD.

I've seen folks advise disabling the Ceph mgr insights module if you have it running and there are degraded PGs, to see if that helps.

What finally solved it for me was doing a rolling restart of my nodes, but I started from all PGs active+clean.

--Lincoln

________________________________

From: ricardo.re.azevedo@xxxxxxxxx<mailto:ricardo.re.azevedo@xxxxxxxxx> <ricardo.re.azevedo@xxxxxxxxx<mailto:ricardo.re.azevedo@xxxxxxxxx>>
Sent: Wednesday, March 10, 2021 5:59 PM
To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>>
Subject:  mon db growing. over 500Gb

Hi all,

I have a fairly pressing issue. I had a monitor fall out of quorum because
it ran out of disk space during rebalancing from switching to upmap. I
noticed all my monitor store.db started taking up nearly all disk space so I
set noout, nobackfill and norecover and shutdown all the monitor daemons.
Each store.db was at:

mon.a 89GB (the one that firt dropped out)

mon.a 400GB

mon.c 400GB

I tried setting mon_compact_on_start. This brought  mon.a down to 1GB. Cool.
However, when I try it on the other monitors it increased the db size
~1Gb/10s so I shut them down again.

Any idea what is going on? Or how can I shrik back down the db?

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx