On Tue, Dec 5, 2023 at 6:35 AM Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > Ok, so I've misunderstood the meaning of failure domain. If there is no > way to request using 2 osd/node and node as failure domain, with 5 nodes > k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a > raid1 setup. A little bit better than replication in the point of view > of global storage capacity. > I'm not sure what you mean by requesting 2osd/node. If the failure domain is set to the host, then by default k/m refer to hosts, and the PGs will be spread across all OSDs on all hosts, but with any particular PG only being present on one OSD on each host. You can get fancy with device classes and crush rules and such and be more specific with how they're allocated, but that would be the typical behavior. Since k/m refer to hosts, then k+m must be less than or equal to the number of hosts or you'll have a degraded pool because there won't be enough hosts to allocate them all. It won't ever stack them across multiple OSDs on the same host with that configuration. k=2,m=2 with min=3 would require at least 4 hosts (k+m), and would allow you to operate degraded with a single host down, and the PGs would become inactive but would still be recoverable with two hosts down. While strictly speaking only 4 hosts are required, you'd do better to have more than that since then the cluster can immediately recover from a loss, assuming you have sufficient space. As you say it is no more space-efficient than RAID1 or size=2, and it suffers write amplification for modifications, but it does allow recovery after the loss of up to two hosts, and you can operate degraded with one host down which allows for somewhat high availability. -- Rich _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx