Re: EC Profiles & DR

Christian Wuerdig <christian.wuerdig@xxxxxxxxx> · Wed, 6 Dec 2023 14:13:27 +1300

You can structure your crush map so that you get multiple EC chunks per
host in a way that you can still survive a host outage outage even though
you have fewer hosts than k+1
For example if you run an EC=4+2 profile on 3 hosts you can structure your
crushmap so that you have 2 chunks per host. This means even if one host is
down you are still guaranteed to have 4 chunks available.
If you then set min_size = 4 you can still operate your cluster in that
situation - albeit risky since any additional failure in that time will
lead to data loss. However in a highly constrained setup it might be a
trade-off that's worth it for you.
There have been examples of this on this mailing list in the past.

On Wed, 6 Dec 2023 at 12:11, Rich Freeman <r-ceph@xxxxxxxxx> wrote:

> On Tue, Dec 5, 2023 at 6:35 AM Patrick Begou
> <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Ok, so I've misunderstood the meaning of failure domain. If there is no
> > way to request using 2 osd/node and node as failure domain, with 5 nodes
> > k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a
> > raid1  setup. A little bit better than replication in the point of view
> > of global storage capacity.
> >
>
> I'm not sure what you mean by requesting 2osd/node.  If the failure
> domain is set to the host, then by default k/m refer to hosts, and the
> PGs will be spread across all OSDs on all hosts, but with any
> particular PG only being present on one OSD on each host.  You can get
> fancy with device classes and crush rules and such and be more
> specific with how they're allocated, but that would be the typical
> behavior.
>
> Since k/m refer to hosts, then k+m must be less than or equal to the
> number of hosts or you'll have a degraded pool because there won't be
> enough hosts to allocate them all.  It won't ever stack them across
> multiple OSDs on the same host with that configuration.
>
> k=2,m=2 with min=3 would require at least 4 hosts (k+m), and would
> allow you to operate degraded with a single host down, and the PGs
> would become inactive but would still be recoverable with two hosts
> down.  While strictly speaking only 4 hosts are required, you'd do
> better to have more than that since then the cluster can immediately
> recover from a loss, assuming you have sufficient space.  As you say
> it is no more space-efficient than RAID1 or size=2, and it suffers
> write amplification for modifications, but it does allow recovery
> after the loss of up to two hosts, and you can operate degraded with
> one host down which allows for somewhat high availability.
>
> --
> Rich
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx