Re: monitor sst files continue growing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 29/10/2020 19:29, Zhenshi Zhou wrote:
Hi Alex,

We found that there were a huge number of keys in the "logm" and "osdmap"
table
while using ceph-monstore-tool. I think that could be the root cause.


But that is exactly how Ceph works. It might need that very old OSDMap to get all the PGs clean again. An OSD which has been gone for a very long time and needs to catch up to make a PG clean.

If not all PGs are active+clean you will and can see the MON databases grow rapidly.

Therefor I always deploy 1TB SSDs in all Monitors. Not expensive anymore and they give breathing room.

I always deploy physical and dedicated machines for Monitors just to prevent these cases.

Wido

Well, some pages also say that disable 'insight' module can resolve this
issue, but
I checked our cluster and we didn't enable this module. check this page
<https://tracker.ceph.com/issues/39955>.

Anyway, our cluster is unhealthy though, it just need time keep recovering
data :)

Thanks

Alex Gracie <alexandergracie17@xxxxxxxxx> 于2020年10月29日周四 下午10:57写道:

We hit this issue over the weekend on our HDD backed EC Nautilus cluster
while removing a single OSD. We also did not have any luck using
compaction. The mon-logs filled up our entire root disk on the mon servers
and we were running on a single monitor for hours while we tried to finish
recovery and reclaim space. The past couple weeks we also noticed "pg not
scubbed in time" errors but are unsure if they are related. I'm still the
exact cause of this(other than the general misplaced/degraded objects) and
what kind of growth is acceptable for these store.db files.

In order to get our downed mons restarted, we ended up backing up and
coping the /var/lib/ceph/mon/* contents to a remote host, setting up an
sshfs mount to that new host with large NVME and SSDs, ensuring the mount
paths were owned by ceph, then clearing up enough space on the monitor host
to start the service. This allowed our store.db directory to grow freely
until the misplaced/degraded objects could recover and monitors all
rejoined eventually.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux