Re: Worst thing that can happen if I have size= 2

Mario Giammarco <mgiammarco@xxxxxxxxx> · Fri, 5 Feb 2021 21:10:33 +0100

Il giorno gio 4 feb 2021 alle ore 12:19 Eneko Lacunza <elacunza@xxxxxxxxx>
ha scritto:

> Hi all,
>
> El 4/2/21 a las 11:56, Frank Schilder escribió:
> >> - three servers
> >> - three monitors
> >> - 6 osd (two per server)
> >> - size=3 and min_size=2
> > This is a set-up that I would not run at all. The first one is, that
> ceph lives on the law of large numbers and 6 is a small number. Hence, your
> OSD fill-up due to uneven distribution.
> >
> > What comes to my mind is a hyper-converged server with 6+ disks in a
> RAID10 array, possibly with a good controller with battery-powered or other
> non-volatile cache. Ceph will never beat that performance. Put in some
> extra disks as hot-spare and you have close to self-healing storage.
> >
> > Such a small ceph cluster will inherit all the baddies of ceph
> (performance, maintenance) without giving any of the goodies (scale-out,
> self-healing, proper distributed raid protection). Ceph needs size to
> become well-performing and pay off the maintenance and architectural effort.
> >
>
> It's funny that we have multiple clusters similar to this, and we and
> our customers couldn't be happier. Just use a HCI solution (like for
> example Proxmox VE, but there are others) to manage everything.
>
>

> Maybe the weakest thing in that configuration is having 2 OSDs per node;
> osd nearfull must be tuned accordingly so that no OSD goes beyond about
> 0.45, so that in case of failure of one disk, the other OSD in the node
> has enough space for healing replication.
>
>
I reply to both: infact I am using Proxmox VE and I am following all
guidelines for ha hyperconverged server:

- three servers as reccomended by proxmox (with 10gb ethernet and so on)
- size=3 and min_size=2 reccomended by Ceph

It is not that a morning I wake up and put some random  hardware together,
I followed guidelines.
The result should be:
- if a disk (or more) brokes work goes on
- if a server brokes the VMs on the server start on another server and
work goes on.

The result is: one disk brokes, ceph fills the other one in the same server
, reaches 90% and EVERYTHING stops including all VMs and the customer has
lost unsaved data and it cannot run the VMs it needs to continue works.
Not very "HA" as hoped.

Size=3 means 3xhdd cost. Now I must double it again 6x. Customer will not
buy other disks.

So I ask (again): apart the known fact that with size=2 I risk that a
second disk brokes before ceph has filled again the second copy of data are
there other risks??
I repeat: I know perfectly size=3 is "better" I followed guidelines but
what can happen with size=2 and min_size=1?
The only thing I can imagine is that if I power down switches I get a split
brain but in this case monitor quorum is not reached and so ceph should
stop writing and so I do not risk inconsistent data.
Are there other things to consider?
Thanks,
Mario
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx