Re: Erasure coded pools and ceph failure domain setup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 02/03/2019 01:02, Ravi Patel wrote:
Hello,

My question is how crush distributes chunks throughout the cluster with erasure coded pools. Currently, we have 4 OSD nodes with 36 drives(OSD daemons) per node. If we use ceph_failire_domaon=host, then we are necessarily limited to k=3,m=1, or k=2,m=2. We would like to explore k>3, m>2 modes of coding but are unsure how the crush rule set will distribute the chunks if we set the crush_failure_domain to OSD

Ideally, we would like CRUSH to distribute the chunks hierarchically so to spread them evenly across the nodes. For example, all chunks are on a single node.

Are chunks evenly spread by default? If not, how might we go about configuring them?
You can write your own CRUSH rules to distribute chunks hierarchically. For example, you can have a k=6, m=2 code together with a rule that guarantees that each node gets two chunks. This means that if you lose a node you do not lose data (though depending on your min_size setting your pool might be unavailable at that point until you replace the node or add a new one and the chunks can be recovered). You would accomplish this with a rule that looks like this:

rule ec8 {
        id <some free id>
        type erasure
        min_size 7
        max_size 8
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take default
        step choose indep 4 type host
        step chooseleaf indep 2 type osd
        step emit
}

This means the rule will first pick 4 hosts, then pick 2 OSDs per host, resulting in a total of 8 OSDs. This is appropriate for k=6 m=2 codes as well as k=5 m=2 codes (that will just leave one random OSD unused), hence min_size 7 max_size 8.

If you just set crush_failure_domain to OSD, then the rule will pick random OSDs without regard for the hosts; you will be able to use effectively any EC widths you want, but there will be no guarantees of data durability if you lose a whole host.

--
Hector Martin (hector@xxxxxxxxxxxxxx)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux