Re: Data not distributed according to weights

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone,

I think I figured out the reason, why in my setup
the three small hosts are almost nearly full 
while there is plenty of free space on the only big one.
(ceph osd tree output below)

It's simply hitting the limit. The algorithm selects 3 hosts.
Even if one copy was always on the big one (le10970),
the last 2 copies would reside on one oft he small ones.

Now summing up the used disk sizes of the small ones
devided by 2 (since 2/3 on all small ones, 1/3 on le10970)
almost matches the sum of used disk sizes of the big one.

Since I strongly believe that this is the reason,
I am considering to change the ruleset to
something like:

step default
step choose firstn 2 type room
step choose firstn 1 type osd      # one for each room?
step emit
set choose firstn 1 type osd
step emit

This should now enable a PG to be mapped to
two OSDs located on the same host, right?
Hopefully, the affected host will be le10790
in most oft he cases.
Is there a way, to tell CRUSH to only allow
the selection of two OSDs on host le10970?

Tanks and all the best,

Frank

> In my rather heterogeneous setup ...
> 
> -1      54.36   root default
> -2      42.44           room 2.162
> -4      6.09                    host le09091
> 3       2.03                            osd.3   up      1
> 1       2.03                            osd.1   up      1
> 9       2.03                            osd.9   up      1
> -6      36.35                   host le10970
> 4       7.27                            osd.4   up      1
> 5       7.27                            osd.5   up      1
> 6       7.27                            osd.6   up      1
> 7       7.27                            osd.7   up      1
> 8       7.27                            osd.8   up      1
> -3      11.92           room 2.166
> -5      5.83                    host
[> ] 
 le09086
> 2       2.03                            osd.2   up      1
> 0       2.03                            osd.0   up      1
> 10      1.77                            osd.10  up      1
> -7      6.09                    host le08544
> 11      2.03                            osd.11  up      1
> 12      2.03                            osd.12  up      1
> 13      2.03                            osd.13  up      1
> 
> ... using size =3 for all pools,
> the OSDs are not filled according to the weights
> (which correspond to the disk sizes in TB).
> 
> le09086 (osd)
> /dev/sdb1    1.8T  1.5T  323G  83% /var/lib/ceph/osd/ceph-10
> /dev/sdc1    2.1T  1.7T  408G  81% /var/lib/ceph/osd/ceph-0
> /dev/sdd1    2.1T  1.7T  344G  84% /var/lib/ceph/osd/ceph-2
> le09091 (osd)
> /dev/sda1    2.1T  1.6T  447G  79% /var/lib/ceph/osd/ceph-9
> /dev/sdc1    2.1T  1.8T  317G  85% /var/lib/ceph/osd/ceph-3
> /dev/sdb1    2.1T  1.7T  384G  82% /var/lib/ceph/osd/ceph-1
> le10970 (osd)
> /dev/sdd1    7.3T  1.4T  5.9T  19% /var/lib/ceph/osd/ceph-6
> /dev/sdf1     7.3T  1.5T  5.9T  21% /var/lib/ceph/osd/ceph-8
> /dev/sde1     7.3T  1.6T  5.7T  22% /var/lib/ceph/osd/ceph-7
> /dev/sdc1     7.3T  1.4T  6.0T  19% /var/lib/ceph/osd/ceph-5
> /dev/sdb1     7.3T  1.5T  5.8T  21% /var/lib/ceph/osd/ceph-4
> le08544 (osd)
> /dev/sdc1     2.1T  1.6T  443G  79% /var/lib/ceph/osd/ceph-13
> /dev/sdb1     2.1T  1.7T  339G  84% /var/lib/ceph/osd/ceph-12
> /dev/sda1     2.1T  1.7T  375G  82% /var/lib/ceph/osd/ceph-11
> 
> Clearly, I would like le10970 to be selected more often!
> 
> Increasing pg(p)_num from 256 to 512 of all the pools
> didn't help.
> 
> Optimal tunables are used ...
> 
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> tunable chooseleaf_vary_r 1
> 
> ... as well as the default algo (straw) and hash (0)
> for all buckets. The ruleset is pretty much standard,
> too ...
> 
> rule replicated_ruleset {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
> 
> ... so I would assume it should select 3 hosts
> (originating from root) according to the weights.
> 
> It is ceph version 0.87.
> 
> Is it the room bucket, which circumvents
> a distribution according tot he weights?
> 
> Thanks and all the best,
> 
> Frank
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux