During normal operation the size is under 1G. After the network ordeal
it was 65G. I gave the last mon all diskspace I could find under
/var/lib/ceph and started the mon again. it is now reaching 90G and
still growing
Does anyone have an idea howmuch disk free would be needed to get the
job done?
Any other strategies to get the cluster going again??
Marc schreef op 2021-08-31 20:02:
Could someone also explain the logics behind the decision to dump so
much data to the disk. Especially in container environments with
resource limits this is not really nice.
-----Original Message-----
Sent: Tuesday, 31 August 2021 19:16
To: ceph-users@xxxxxxx
Subject: nautilus cluster down by loss of 2 mons
Hi
We have a nautilus cluster that was plagued by a network failure. One
of
the monitors fell out of quorum
Once the network settled down and all osds were back online again we
got
that mon synchronizing
However the filesystem suddenly exploded in a minute or so from 63G
usage to 93G resulting in 100% usage
At that point we decided to remove that mon from the cluster and hope
to
compact the database on the remaining mons so that
we could add a new mon while there was less synchronizing to do
because
of the smaller database size
Unfortunately the tell osd compact command made the database on mon nr
2
grow very fast resulting in another full filesystem
hence a dead cluster
Can anyone advice towards the fastest recovery in this situation?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx