Re: EC Profiles & DR

Curt <lightspd@xxxxxxxxx> · Wed, 6 Dec 2023 18:56:51 +0400

Hi Patrick,

Yes K and M are chunks, but the default crush map is a chunk per host,
which is probably the best way to do it, but I'm no expert. I'm not sure
why you would want to do a crush map with 2 chunks per host and min size 4
as it' s just asking for trouble at some point, in my opinion.  Anyway,
take a look at this post if your interested in doing 2 chunks per host it
will give you an idea of crushmap setup,
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NB3M22GNAC7VNWW7YBVYTH6TBZOYLTWA/
.

Regards,
Curt

On Wed, Dec 6, 2023 at 6:26 PM Patrick Begou <
Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Le 06/12/2023 à 00:11, Rich Freeman a écrit :
> > On Tue, Dec 5, 2023 at 6:35 AM Patrick Begou
> > <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>  wrote:
> >> Ok, so I've misunderstood the meaning of failure domain. If there is no
> >> way to request using 2 osd/node and node as failure domain, with 5 nodes
> >> k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a
> >> raid1  setup. A little bit better than replication in the point of view
> >> of global storage capacity.
> >>
> > I'm not sure what you mean by requesting 2osd/node.  If the failure
> > domain is set to the host, then by default k/m refer to hosts, and the
> > PGs will be spread across all OSDs on all hosts, but with any
> > particular PG only being present on one OSD on each host.  You can get
> > fancy with device classes and crush rules and such and be more
> > specific with how they're allocated, but that would be the typical
> > behavior.
> >
> > Since k/m refer to hosts, then k+m must be less than or equal to the
> > number of hosts or you'll have a degraded pool because there won't be
> > enough hosts to allocate them all.  It won't ever stack them across
> > multiple OSDs on the same host with that configuration.
> >
> > k=2,m=2 with min=3 would require at least 4 hosts (k+m), and would
> > allow you to operate degraded with a single host down, and the PGs
> > would become inactive but would still be recoverable with two hosts
> > down.  While strictly speaking only 4 hosts are required, you'd do
> > better to have more than that since then the cluster can immediately
> > recover from a loss, assuming you have sufficient space.  As you say
> > it is no more space-efficient than RAID1 or size=2, and it suffers
> > write amplification for modifications, but it does allow recovery
> > after the loss of up to two hosts, and you can operate degraded with
> > one host down which allows for somewhat high availability.
> >
> Hi Rich,
>
> My understood was that k and m were for EC chunks not hosts. 🙁 Of
> course if k and m are hosts the best choice would be k=2 and m=2.
>
> When Christian wrote:
> /For example if you run an EC=4+2 profile on 3 hosts you can structure
> your crushmap so that you have 2 chunks per host. This means even if one
> host is down you are still guaranteed to have 4 chunks available./
>
> This is that I had thought before (and using 5 nodes instead of 3 as the
> Christian's example). But it does not match what you explain if k and m
> are nodes.
>
> I'm a little bit confused with crushmap settings.
>
> Patrick
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx