> - three servers as reccomended by proxmox (with 10gb ethernet and so on) > - size=3 and min_size=2 reccomended by Ceph You forgot the ceph recommendation* to provide sufficient fail-over capacity in case a failure domain or disk fails. The recommendation would be to have 4 hosts with 25% capacity left free for fail-over and another 10% for handling imbalance. With very few disks I would increase the buffer for imbalance. * Its actually not a recommendation, its a requirement for non-experimental clusters. Everything else has been answered already in great detail. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Mario Giammarco <mgiammarco@xxxxxxxxx> Sent: 05 February 2021 21:10:33 To: Eneko Lacunza Cc: Ceph Users Subject: Re: Worst thing that can happen if I have size= 2 Il giorno gio 4 feb 2021 alle ore 12:19 Eneko Lacunza <elacunza@xxxxxxxxx> ha scritto: > Hi all, > > El 4/2/21 a las 11:56, Frank Schilder escribió: > >> - three servers > >> - three monitors > >> - 6 osd (two per server) > >> - size=3 and min_size=2 > > This is a set-up that I would not run at all. The first one is, that > ceph lives on the law of large numbers and 6 is a small number. Hence, your > OSD fill-up due to uneven distribution. > > > > What comes to my mind is a hyper-converged server with 6+ disks in a > RAID10 array, possibly with a good controller with battery-powered or other > non-volatile cache. Ceph will never beat that performance. Put in some > extra disks as hot-spare and you have close to self-healing storage. > > > > Such a small ceph cluster will inherit all the baddies of ceph > (performance, maintenance) without giving any of the goodies (scale-out, > self-healing, proper distributed raid protection). Ceph needs size to > become well-performing and pay off the maintenance and architectural effort. > > > > It's funny that we have multiple clusters similar to this, and we and > our customers couldn't be happier. Just use a HCI solution (like for > example Proxmox VE, but there are others) to manage everything. > > > Maybe the weakest thing in that configuration is having 2 OSDs per node; > osd nearfull must be tuned accordingly so that no OSD goes beyond about > 0.45, so that in case of failure of one disk, the other OSD in the node > has enough space for healing replication. > > I reply to both: infact I am using Proxmox VE and I am following all guidelines for ha hyperconverged server: - three servers as reccomended by proxmox (with 10gb ethernet and so on) - size=3 and min_size=2 reccomended by Ceph It is not that a morning I wake up and put some random hardware together, I followed guidelines. The result should be: - if a disk (or more) brokes work goes on - if a server brokes the VMs on the server start on another server and work goes on. The result is: one disk brokes, ceph fills the other one in the same server , reaches 90% and EVERYTHING stops including all VMs and the customer has lost unsaved data and it cannot run the VMs it needs to continue works. Not very "HA" as hoped. Size=3 means 3xhdd cost. Now I must double it again 6x. Customer will not buy other disks. So I ask (again): apart the known fact that with size=2 I risk that a second disk brokes before ceph has filled again the second copy of data are there other risks?? I repeat: I know perfectly size=3 is "better" I followed guidelines but what can happen with size=2 and min_size=1? The only thing I can imagine is that if I power down switches I get a split brain but in this case monitor quorum is not reached and so ceph should stop writing and so I do not risk inconsistent data. Are there other things to consider? Thanks, Mario _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx