Re: nautilus cluster down by loss of 2 mons

Marc <Marc@xxxxxxxxxxxxxxxxx> · Tue, 31 Aug 2021 18:02:01 +0000

Could someone also explain the logics behind the decision to dump so much data to the disk. Especially in container environments with resource limits this is not really nice. 

> -----Original Message-----
> Sent: Tuesday, 31 August 2021 19:16
> To: ceph-users@xxxxxxx
> Subject:  nautilus cluster down by loss of 2 mons
> 
> Hi
> 
> We have a nautilus cluster that was plagued by a network failure. One of
> the monitors fell out of quorum
> Once the network settled down and all osds were back online again we got
> that mon synchronizing
> 
> However the filesystem suddenly exploded in a minute or so from 63G
> usage to 93G  resulting in 100% usage
> At that point we decided to remove that mon from the cluster and hope to
> compact the database on the remaining mons so that
> we could add a new mon while there was less synchronizing to do because
> of the smaller database size
> 
> Unfortunately the tell osd compact command made the database on mon nr 2
> grow very fast resulting in another full filesystem
> hence a dead cluster
> 
> Can anyone advice towards the fastest recovery in this situation?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx