Re: CRUSH rule for EC 6+2 on 6-node cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Actually both our solutions don't work very well.  Frequently the same OSD was chosen for multiple chunks:


8.72     9751         0          0        0  40895512576            0           0  1302                   active+clean     2h  224790'12801   225410:49810    [13,1,14,11,18,2,19,13]p13    [13,1,14,11,18,2,19,13]p13  2021-05-11T22:41:11.332885+0000  2021-05-11T22:41:11.332885+0000
8.7f     9695         0          0        0  40661680128            0           0  2184                   active+clean     5h  224790'12850   225409:57529        [8,17,4,1,14,0,19,8]p8        [8,17,4,1,14,0,19,8]p8  2021-05-11T22:41:11.332885+0000  2021-05-11T22:41:11.332885+0000

I'm now considering using device classes and assigning the OSDs to either hdd1 or hdd2...  Unless someone has another idea?

Thanks,
Bryan

> On May 14, 2021, at 12:35 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
> 
> This works better than my solution.  It allows the cluster to put more PGs on the systems with more space on them:
> 
> # for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do
>>  echo $pg
>>  for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
>>    ceph osd find $osd | jq -r '.host'
>>  done | sort | uniq -c | sort -n -k1
>> done
> 8.0
>      1 excalibur
>      1 mandalaybay
>      2 aladdin
>      2 harrahs
>      2 paris
> 8.1
>      1 aladdin
>      1 excalibur
>      1 harrahs
>      1 mirage
>      2 mandalaybay
>      2 paris
> 8.2
>      1 aladdin
>      1 mandalaybay
>      2 harrahs
>      2 mirage
>      2 paris
> ...
> 
> Thanks!
> Bryan
> 
>> On May 13, 2021, at 2:58 AM, Ján Senko <janos@xxxxxxxxxxxxx> wrote:
>> 
>> Caution: This email is from an external sender. Please do not click links or open attachments unless you recognize the sender and know the content is safe. Forward suspicious emails to isitbad@.
>> 
>> 
>> 
>> Would something like this work?
>> 
>> step take default
>> step choose indep 4 type host
>> step chooseleaf indep 1 type osd
>> step emit
>> step take default
>> step choose indep 0 type host
>> step chooseleaf indep 1 type osd
>> step emit
>> 
>> J.
>> 
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> 
>> On Wednesday, May 12th, 2021 at 17:58, Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
>> 
>>> I'm trying to figure out a CRUSH rule that will spread data out across my cluster as much as possible, but not more than 2 chunks per host.
>>> 
>>> If I use the default rule with an osd failure domain like this:
>>> 
>>> step take default
>>> 
>>> step choose indep 0 type osd
>>> 
>>> step emit
>>> 
>>> I get clustering of 3-4 chunks on some of the hosts:
>>> 
>>> for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do
>>> =======================================================================================
>>> 
>>>> echo $pg
>>>> 
>>>> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
>>>> 
>>>> ceph osd find $osd | jq -r '.host'
>>>> 
>>>> done | sort | uniq -c | sort -n -k1
>>> 
>>> 8.0
>>> 
>>> 1 harrahs
>>> 
>>> 3 paris
>>> 
>>> 4 aladdin
>>> 
>>> 8.1
>>> 
>>> 1 aladdin
>>> 
>>> 1 excalibur
>>> 
>>> 2 mandalaybay
>>> 
>>> 4 paris
>>> 
>>> 8.2
>>> 
>>> 1 harrahs
>>> 
>>> 2 aladdin
>>> 
>>> 2 mirage
>>> 
>>> 3 paris
>>> 
>>> ...
>>> 
>>> However, if I change the rule to use:
>>> 
>>> step take default
>>> 
>>> step choose indep 0 type host
>>> 
>>> step chooseleaf indep 2 type osd
>>> 
>>> step emit
>>> 
>>> I get the data spread across 4 hosts with 2 chunks per host:
>>> 
>>> for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do
>>> =======================================================================================
>>> 
>>>> echo $pg
>>>> 
>>>> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
>>>> 
>>>> ceph osd find $osd | jq -r '.host'
>>>> 
>>>> done | sort | uniq -c | sort -n -k1
>>>> 
>>>> done
>>> 
>>> 8.0
>>> 
>>> 2 aladdin
>>> 
>>> 2 harrahs
>>> 
>>> 2 mandalaybay
>>> 
>>> 2 paris
>>> 
>>> 8.1
>>> 
>>> 2 aladdin
>>> 
>>> 2 harrahs
>>> 
>>> 2 mandalaybay
>>> 
>>> 2 paris
>>> 
>>> 8.2
>>> 
>>> 2 harrahs
>>> 
>>> 2 mandalaybay
>>> 
>>> 2 mirage
>>> 
>>> 2 paris
>>> 
>>> ...
>>> 
>>> Is it possible to get the data to spread out over more hosts? I plan on expanding the cluster in the near future and would like to see more hosts get 1 chunk instead of 2.
>>> 
>>> Also, before you recommend adding two more hosts and switching to a host-based failure domain, the cluster is on a variety of hardware with between 2-6 drives per host and drives that are 4TB-12TB in size (it's part of my home lab).
>>> 
>>> Thanks,
>>> 
>>> Bryan
>>> 
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> 
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux