Hi Alex, We found that there were a huge number of keys in the "logm" and "osdmap" table while using ceph-monstore-tool. I think that could be the root cause. Well, some pages also say that disable 'insight' module can resolve this issue, but I checked our cluster and we didn't enable this module. check this page <https://tracker.ceph.com/issues/39955>. Anyway, our cluster is unhealthy though, it just need time keep recovering data :) Thanks Alex Gracie <alexandergracie17@xxxxxxxxx> 于2020年10月29日周四 下午10:57写道: > We hit this issue over the weekend on our HDD backed EC Nautilus cluster > while removing a single OSD. We also did not have any luck using > compaction. The mon-logs filled up our entire root disk on the mon servers > and we were running on a single monitor for hours while we tried to finish > recovery and reclaim space. The past couple weeks we also noticed "pg not > scubbed in time" errors but are unsure if they are related. I'm still the > exact cause of this(other than the general misplaced/degraded objects) and > what kind of growth is acceptable for these store.db files. > > In order to get our downed mons restarted, we ended up backing up and > coping the /var/lib/ceph/mon/* contents to a remote host, setting up an > sshfs mount to that new host with large NVME and SSDs, ensuring the mount > paths were owned by ceph, then clearing up enough space on the monitor host > to start the service. This allowed our store.db directory to grow freely > until the misplaced/degraded objects could recover and monitors all > rejoined eventually. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx