Re: nautilus cluster down by loss of 2 mons

Frank Schilder <frans@xxxxxx> · Tue, 31 Aug 2021 19:29:43 +0000

Hi Mac,

when I started with ceph, there was a page with hardware recommendations (https://docs.ceph.com/en/mimic/start/hardware-recommendations/) and some reference configurations from big vendors to look at. Firstly, this page seems to have disappeared in latest and, secondly, I have to agree with you that the minimum requirements for the MON store (Disk Space 10 GB per daemon) is a joke. In the ceph-user list you will find anecdotal evidence that the store reached up to 1TB temporarily in certain crisis situations.

I think for a production storage cluster one should not state unrealistic minimum requirements, but safe guards being able to handle worst-case situations. When I bought the hardware for the first cluster, we got help from a ceph consultant and he strongly recommended a raid10 array of 6 600GB 15K SAS HD drives for the MON store for size and minimum performance.

There were also some vendors who published reference configurations with benchmarks that were very helpful for dimensioning the cluster. It would really be great if a hardware requirements page was added again with realistic recommendations, possibly even linking to actual ceph-user cases where a strange sounding spec (would have) saved the day. In fact, a tool like cephadm should check for reasonable minimum hardware and issue warnings if something seems inappropriate. The famous --yes-i-really-really-know-what-i-am-doing flag could allow to ignore these.

Sleep is more important than cheap.

I think you need to add a separate decent drive (enterprise boot SSD will do), bind-mount this into the location of the MON store and bring a MON up on it. In the worst case, you might need to take the cluster down and build valid MON stores from the OSDs (https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds -- check your version!). Hope you can do without taking the cluster down.

Good luck and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Marc <Marc@xxxxxxxxxxxxxxxxx>
Sent: 31 August 2021 20:02:01
To: ceph-users@xxxxxxx
Subject:  Re: nautilus cluster down by loss of 2 mons

Could someone also explain the logics behind the decision to dump so much data to the disk. Especially in container environments with resource limits this is not really nice.

> -----Original Message-----
> Sent: Tuesday, 31 August 2021 19:16
> To: ceph-users@xxxxxxx
> Subject:  nautilus cluster down by loss of 2 mons
>
> Hi
>
> We have a nautilus cluster that was plagued by a network failure. One of
> the monitors fell out of quorum
> Once the network settled down and all osds were back online again we got
> that mon synchronizing
>
> However the filesystem suddenly exploded in a minute or so from 63G
> usage to 93G  resulting in 100% usage
> At that point we decided to remove that mon from the cluster and hope to
> compact the database on the remaining mons so that
> we could add a new mon while there was less synchronizing to do because
> of the smaller database size
>
> Unfortunately the tell osd compact command made the database on mon nr 2
> grow very fast resulting in another full filesystem
> hence a dead cluster
>
> Can anyone advice towards the fastest recovery in this situation?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx