Re: Monitor leveldb growing without bound v14.2.16

Frank Schilder <frans@xxxxxx> · Wed, 3 Mar 2021 08:02:59 +0000

Slow mon sync can be caused by too large mon_sync_max_payload_size. The default is usually way too high. I had sync problems until I set

mon_sync_max_payload_size = 4096

Since then mon sync is not an issue any more.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Peter Woodman <peter@xxxxxxxxxxxx>
Sent: 03 March 2021 06:26:47
To: Lincoln Bryant
Cc: ceph-users
Subject:  Re: Monitor leveldb growing without bound v14.2.16

is the ceph insights plugin enabled? this caused huge huge bloat of the mon
stores for me. before i figured that out, i turned on leveldb compression
options on the mon store and got pretty significant savings, also.

On Tue, Mar 2, 2021 at 6:56 PM Lincoln Bryant <lincolnb@xxxxxxxxxxxx> wrote:

> Hi list,
>
> We recently had a cluster outage over the weekend where several OSDs were
> inaccessible over night for several hours. When I found the cluster in the
> morning, the monitors' root disks (which contained both the monitor's
> leveldb and the Ceph logs) had completely filled.
>
> After restarting OSDs, cleaning out the monitors' logs, moving
> /var/lib/ceph to dedicated disks on the mons, and starting recovery (in
> which there was 1 unfound object that I marked lost, if that has any
> relevancy), the leveldb continued/continues to grow without bound. The
> cluster has all PGs in active+clean at this point, yet I'm accumulating
> what seems like approximately ~1GB/hr of new leveldb data.
>
> Two of the monitors (a, c) are in quorum, while the third (b) has been
> synchronizing for the last several hours, but doesn't seem to be able to
> catch up. Mon 'b' has been running for 4 hours now in the 'synchronizing'
> state. The mon's log has many messages about compacting and deleting files,
> yet we never exit the synchronization state.
>
> The ceph.log is also rapidly accumulating complaints that the mons are
> slow (not surprising, I suppose, since the levelDBs are ~100GB at this
> point).
>
> I've found that using monstore tool to do compaction on mons 'a' and 'c'
> thelps but is only a temporary fix. Soon the database inflates again and
> I'm back to where I started.
>
> Thoughts on how to proceed here? Some ideas I had:
>    - Would it help to add some new monitors that use RocksDB?
>    - Stop a monitor and dump the keys via monstoretool, just to get an
> idea of what's going on?
>    - Increase mon_sync_max_payload_size to try to move data in larger
> chunks?
>    - Drop down to a single monitor, and see if normal compaction triggers
> and stops growing unbounded?
>    - Stop both 'a' and 'c', compact them, start them, and immediately
> start 'b' ?
>
> Appreciate any advice.
>
> Regards,
> Lincoln
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx