Actually both our solutions don't work very well. Frequently the same OSD was chosen for multiple chunks: 8.72 9751 0 0 0 40895512576 0 0 1302 active+clean 2h 224790'12801 225410:49810 [13,1,14,11,18,2,19,13]p13 [13,1,14,11,18,2,19,13]p13 2021-05-11T22:41:11.332885+0000 2021-05-11T22:41:11.332885+0000 8.7f 9695 0 0 0 40661680128 0 0 2184 active+clean 5h 224790'12850 225409:57529 [8,17,4,1,14,0,19,8]p8 [8,17,4,1,14,0,19,8]p8 2021-05-11T22:41:11.332885+0000 2021-05-11T22:41:11.332885+0000 I'm now considering using device classes and assigning the OSDs to either hdd1 or hdd2... Unless someone has another idea? Thanks, Bryan > On May 14, 2021, at 12:35 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote: > > This works better than my solution. It allows the cluster to put more PGs on the systems with more space on them: > > # for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do >> echo $pg >> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do >> ceph osd find $osd | jq -r '.host' >> done | sort | uniq -c | sort -n -k1 >> done > 8.0 > 1 excalibur > 1 mandalaybay > 2 aladdin > 2 harrahs > 2 paris > 8.1 > 1 aladdin > 1 excalibur > 1 harrahs > 1 mirage > 2 mandalaybay > 2 paris > 8.2 > 1 aladdin > 1 mandalaybay > 2 harrahs > 2 mirage > 2 paris > ... > > Thanks! > Bryan > >> On May 13, 2021, at 2:58 AM, Ján Senko <janos@xxxxxxxxxxxxx> wrote: >> >> Caution: This email is from an external sender. Please do not click links or open attachments unless you recognize the sender and know the content is safe. Forward suspicious emails to isitbad@. >> >> >> >> Would something like this work? >> >> step take default >> step choose indep 4 type host >> step chooseleaf indep 1 type osd >> step emit >> step take default >> step choose indep 0 type host >> step chooseleaf indep 1 type osd >> step emit >> >> J. >> >> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >> >> On Wednesday, May 12th, 2021 at 17:58, Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote: >> >>> I'm trying to figure out a CRUSH rule that will spread data out across my cluster as much as possible, but not more than 2 chunks per host. >>> >>> If I use the default rule with an osd failure domain like this: >>> >>> step take default >>> >>> step choose indep 0 type osd >>> >>> step emit >>> >>> I get clustering of 3-4 chunks on some of the hosts: >>> >>> for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do >>> ======================================================================================= >>> >>>> echo $pg >>>> >>>> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do >>>> >>>> ceph osd find $osd | jq -r '.host' >>>> >>>> done | sort | uniq -c | sort -n -k1 >>> >>> 8.0 >>> >>> 1 harrahs >>> >>> 3 paris >>> >>> 4 aladdin >>> >>> 8.1 >>> >>> 1 aladdin >>> >>> 1 excalibur >>> >>> 2 mandalaybay >>> >>> 4 paris >>> >>> 8.2 >>> >>> 1 harrahs >>> >>> 2 aladdin >>> >>> 2 mirage >>> >>> 3 paris >>> >>> ... >>> >>> However, if I change the rule to use: >>> >>> step take default >>> >>> step choose indep 0 type host >>> >>> step chooseleaf indep 2 type osd >>> >>> step emit >>> >>> I get the data spread across 4 hosts with 2 chunks per host: >>> >>> for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do >>> ======================================================================================= >>> >>>> echo $pg >>>> >>>> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do >>>> >>>> ceph osd find $osd | jq -r '.host' >>>> >>>> done | sort | uniq -c | sort -n -k1 >>>> >>>> done >>> >>> 8.0 >>> >>> 2 aladdin >>> >>> 2 harrahs >>> >>> 2 mandalaybay >>> >>> 2 paris >>> >>> 8.1 >>> >>> 2 aladdin >>> >>> 2 harrahs >>> >>> 2 mandalaybay >>> >>> 2 paris >>> >>> 8.2 >>> >>> 2 harrahs >>> >>> 2 mandalaybay >>> >>> 2 mirage >>> >>> 2 paris >>> >>> ... >>> >>> Is it possible to get the data to spread out over more hosts? I plan on expanding the cluster in the near future and would like to see more hosts get 1 chunk instead of 2. >>> >>> Also, before you recommend adding two more hosts and switching to a host-based failure domain, the cluster is on a variety of hardware with between 2-6 drives per host and drives that are 4TB-12TB in size (it's part of my home lab). >>> >>> Thanks, >>> >>> Bryan >>> >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx