On 20/10/18 05:28, Cody wrote:
Hi folks, I have a rookie question. Does the number of the buckets chosen as the failure domain must be equal or greater than the number of replica (or k+m for erasure coding)? E.g., for an erasure code profile where k=4, m=2, failure domain=rack, does it only work when there are 6 or more racks in the CRUSH hierarchy? Or would it continue to iterate down the tree and eventually would work as long as there are 6 or more OSDs? Thank you very much. Best regards, Cody _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
The rule associated with the ec profile you mentioned, will indeed try to select 6 rack buckets then get an osd leaf from each. If you only had 5 racks for example, it will return only 5 osds per PG, the pool will function but in degraded state (if pool min_size was 5). This rule will not return more that 1 osd per rack, if it did it will not achieving the failure domain you gave. You can write a custom rule that uses 2 racks and select 3 hosts from each, and associate this with the k4 m2 pool, crush will not mind..it will do whatever you tell it, but if 1 rack fails your pool goes down, so would not be achieving a failure domain at rack level unless you do have 6 or more racks.
Maged _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com