You can structure your crush map so that you get multiple EC chunks per host in a way that you can still survive a host outage outage even though you have fewer hosts than k+1 For example if you run an EC=4+2 profile on 3 hosts you can structure your crushmap so that you have 2 chunks per host. This means even if one host is down you are still guaranteed to have 4 chunks available. If you then set min_size = 4 you can still operate your cluster in that situation - albeit risky since any additional failure in that time will lead to data loss. However in a highly constrained setup it might be a trade-off that's worth it for you. There have been examples of this on this mailing list in the past. On Wed, 6 Dec 2023 at 12:11, Rich Freeman <r-ceph@xxxxxxxxx> wrote: > On Tue, Dec 5, 2023 at 6:35 AM Patrick Begou > <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > Ok, so I've misunderstood the meaning of failure domain. If there is no > > way to request using 2 osd/node and node as failure domain, with 5 nodes > > k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a > > raid1 setup. A little bit better than replication in the point of view > > of global storage capacity. > > > > I'm not sure what you mean by requesting 2osd/node. If the failure > domain is set to the host, then by default k/m refer to hosts, and the > PGs will be spread across all OSDs on all hosts, but with any > particular PG only being present on one OSD on each host. You can get > fancy with device classes and crush rules and such and be more > specific with how they're allocated, but that would be the typical > behavior. > > Since k/m refer to hosts, then k+m must be less than or equal to the > number of hosts or you'll have a degraded pool because there won't be > enough hosts to allocate them all. It won't ever stack them across > multiple OSDs on the same host with that configuration. > > k=2,m=2 with min=3 would require at least 4 hosts (k+m), and would > allow you to operate degraded with a single host down, and the PGs > would become inactive but would still be recoverable with two hosts > down. While strictly speaking only 4 hosts are required, you'd do > better to have more than that since then the cluster can immediately > recover from a loss, assuming you have sufficient space. As you say > it is no more space-efficient than RAID1 or size=2, and it suffers > write amplification for modifications, but it does allow recovery > after the loss of up to two hosts, and you can operate degraded with > one host down which allows for somewhat high availability. > > -- > Rich > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx