Re: CRUSH rule for EC 6+2 on 6-node cluster

Bryan Stillwell <bstillwell@xxxxxxxxxxx> · Wed, 12 May 2021 16:28:42 +0000

I was able to figure out the solution with this rule:

        step take default
        step choose indep 0 type host
        step chooseleaf indep 1 type osd
        step emit
        step take default
        step choose indep 0 type host
        step chooseleaf indep 1 type osd
        step emit

Now the data is spread how I want it to be:

# for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do
>   echo $pg
>   for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
>     ceph osd find $osd | jq -r '.host'
>   done | sort | uniq -c | sort -n -k1
> done
8.0
      1 excalibur
      1 harrahs
      1 mandalaybay
      1 mirage
      2 aladdin
      2 paris
8.1
      1 aladdin
      1 excalibur
      1 harrahs
      1 mirage
      2 mandalaybay
      2 paris
8.2
      1 aladdin
      1 excalibur
      1 harrahs
      1 mirage
      2 mandalaybay
      2 paris
...

Hopefully someone else will find this useful.

Bryan

> On May 12, 2021, at 9:58 AM, Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
> 
> I'm trying to figure out a CRUSH rule that will spread data out across my cluster as much as possible, but not more than 2 chunks per host.
> 
> If I use the default rule with an osd failure domain like this:
> 
> step take default
> step choose indep 0 type osd
> step emit
> 
> I get clustering of 3-4 chunks on some of the hosts:
> 
> # for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do
>>  echo $pg
>>  for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
>>    ceph osd find $osd | jq -r '.host'
>>  done | sort | uniq -c | sort -n -k1
> 8.0
>      1 harrahs
>      3 paris
>      4 aladdin
> 8.1
>      1 aladdin
>      1 excalibur
>      2 mandalaybay
>      4 paris
> 8.2
>      1 harrahs
>      2 aladdin
>      2 mirage
>      3 paris
> ...
> 
> However, if I change the rule to use:
> 
> step take default
> step choose indep 0 type host
> step chooseleaf indep 2 type osd
> step emit
> 
> I get the data spread across 4 hosts with 2 chunks per host:
> 
> # for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do
>>  echo $pg
>>  for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
>>    ceph osd find $osd | jq -r '.host'
>>  done | sort | uniq -c | sort -n -k1
>> done
> 8.0
>      2 aladdin
>      2 harrahs
>      2 mandalaybay
>      2 paris
> 8.1
>      2 aladdin
>      2 harrahs
>      2 mandalaybay
>      2 paris
> 8.2
>      2 harrahs
>      2 mandalaybay
>      2 mirage
>      2 paris
> ...
> 
> Is it possible to get the data to spread out over more hosts?  I plan on expanding the cluster in the near future and would like to see more hosts get 1 chunk instead of 2.
> 
> Also, before you recommend adding two more hosts and switching to a host-based failure domain, the cluster is on a variety of hardware with between 2-6 drives per host and drives that are 4TB-12TB in size (it's part of my home lab).
> 
> Thanks,
> Bryan
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx