Re: Num values for 3 DC 4+2 crush rule

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Torkil,

Num is 0 but it's not replicated so how does this translate to picking 3 of 3 datacenters?

it doesn't really make a difference if replicated or not, it just defines how many crush buckets to choose, so it applies in the same way as for your replicated pool.

I am thinking we should just change 3 to 2 for the chooseleaf line for the 4+2 rule since for 4+5 each DC needs 3 shards and for 4+2 each DC needs 2 shards. Comments?

Unless your output from the 4+2 rule is incomplete, it doesn't currently contain a line how many datacenters to choose, so don't forget that. ;-) But yeah, you could either have

        step choose indep 0 type datacenter
        step chooseleaf indep 2 type host

or

        step choose indep 3 type datacenter
        step chooseleaf indep 2 type host

The result should be the same. But I recommend to verify with the crushtool:

# get current crushmap
ceph osd getcrushmap -o crushmap.bin
# decompile crushmap
crushtool -d crushmap.bin -o crushmap.txt
# change your crush rule

# test it
crushtool -i crushmap.test --test --rule <rule_id> --show-mappings --num-rep 6
crushtool -i crushmap.test --test --rule <rule_id> --show-bad-mappings --num-rep 6

You'll see the osd mappings which will tell you if the PGs would be distributed as required.

Regards,
Eugen

Zitat von Torkil Svensgaard <torkil@xxxxxxxx>:

I was just looking at our crush rules as we need to change them from failure domain host to failure domain datacenter. The replicated ones seem trivial but what about this one for EC 4+2?

rule rbd_ec_data {
        id 0
        type erasure
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take default class hdd
        step chooseleaf indep 0 type host
        step emit
}

We already have this crush rule for EC 4+5:

"
rule cephfs.hdd.data {
        id 7
        type erasure
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take default class hdd
        step choose indep 0 type datacenter
        step chooseleaf indep 3 type host
        step emit
}
"

I don't understand the "num" argument for the choose step. The documentation[1] says:

"
If {num} == 0, choose pool-num-replicas buckets (as many buckets as are available).

If pool-num-replicas > {num} > 0, choose that many buckets.

If {num} < 0, choose pool-num-replicas - {num} buckets.
"

Num is 0 but it's not replicated so how does this translate to picking 3 of 3 datacenters?

I am thinking we should just change 3 to 2 for the chooseleaf line for the 4+2 rule since for 4+5 each DC needs 3 shards and for 4+2 each DC needs 2 shards. Comments?

Mvh.

Torkil

[1] https://docs.ceph.com/en/reef/rados/operations/crush-map-edits/

--
Torkil Svensgaard
Systems Administrator
Danish Research Centre for Magnetic Resonance DRCMR, Section 714
Copenhagen University Hospital Amager and Hvidovre
Kettegaard Allé 30, 2650 Hvidovre, Denmark
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux