is the ceph insights plugin enabled? this caused huge huge bloat of the mon stores for me. before i figured that out, i turned on leveldb compression options on the mon store and got pretty significant savings, also. On Tue, Mar 2, 2021 at 6:56 PM Lincoln Bryant <lincolnb@xxxxxxxxxxxx> wrote: > Hi list, > > We recently had a cluster outage over the weekend where several OSDs were > inaccessible over night for several hours. When I found the cluster in the > morning, the monitors' root disks (which contained both the monitor's > leveldb and the Ceph logs) had completely filled. > > After restarting OSDs, cleaning out the monitors' logs, moving > /var/lib/ceph to dedicated disks on the mons, and starting recovery (in > which there was 1 unfound object that I marked lost, if that has any > relevancy), the leveldb continued/continues to grow without bound. The > cluster has all PGs in active+clean at this point, yet I'm accumulating > what seems like approximately ~1GB/hr of new leveldb data. > > Two of the monitors (a, c) are in quorum, while the third (b) has been > synchronizing for the last several hours, but doesn't seem to be able to > catch up. Mon 'b' has been running for 4 hours now in the 'synchronizing' > state. The mon's log has many messages about compacting and deleting files, > yet we never exit the synchronization state. > > The ceph.log is also rapidly accumulating complaints that the mons are > slow (not surprising, I suppose, since the levelDBs are ~100GB at this > point). > > I've found that using monstore tool to do compaction on mons 'a' and 'c' > thelps but is only a temporary fix. Soon the database inflates again and > I'm back to where I started. > > Thoughts on how to proceed here? Some ideas I had: > - Would it help to add some new monitors that use RocksDB? > - Stop a monitor and dump the keys via monstoretool, just to get an > idea of what's going on? > - Increase mon_sync_max_payload_size to try to move data in larger > chunks? > - Drop down to a single monitor, and see if normal compaction triggers > and stops growing unbounded? > - Stop both 'a' and 'c', compact them, start them, and immediately > start 'b' ? > > Appreciate any advice. > > Regards, > Lincoln > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx