Re: handling different disk sizes

Christian Balzer <chibi@xxxxxxx> · Mon, 5 Jun 2017 21:48:24 +0900

Hello,

On Mon, 5 Jun 2017 13:54:02 +0200 Félix Barbeira wrote:

> Hi,
> 
> We have a small cluster for radosgw use only. It has three nodes, witch 3
            ^^^^^                                      ^^^^^
> osds each. Each node has different disk sizes:
> 

There's your answer, staring you right in the face.

Your default replication size is 3, your default failure domain is host.

Ceph can not distribute data according to the weight, since it needs to be
on a different node (one replica per node) to comply with the replica size.

If your cluster had 4 or more nodes, you'd see what you expected.
And most likely wouldn't be happy about the performance with your 8TB HDDs
seeing 4 times more I/Os than then 2TB ones and thus becoming the
bottleneck of your cluster.

Christian

> node01 : 3x8TB
> node02 : 3x2TB
> node03 : 3x3TB
> 
> I thought that the weight handle the amount of data that every osd receive.
> In this case for example the node with the 8TB disks should receive more
> than the rest, right? All of them receive the same amount of data and the
> smaller disk (2TB) reaches 100% before the bigger ones. Am I doing
> something wrong?
> 
> The cluster is jewel LTS 10.2.7.
> 
> # ceph osd df
> ID WEIGHT  REWEIGHT SIZE   USE   AVAIL  %USE  VAR  PGS
>  0 7.27060  1.00000  7445G 1012G  6432G 13.60 0.57 133
>  3 7.27060  1.00000  7445G 1081G  6363G 14.52 0.61 163
>  4 7.27060  1.00000  7445G  787G  6657G 10.58 0.44 120
>  1 1.81310  1.00000  1856G 1047G   809G 56.41 2.37 143
>  5 1.81310  1.00000  1856G  956G   899G 51.53 2.16 143
>  6 1.81310  1.00000  1856G  877G   979G 47.24 1.98 130
>  2 2.72229  1.00000  2787G 1010G  1776G 36.25 1.52 140
>  7 2.72229  1.00000  2787G  831G  1955G 29.83 1.25 130
>  8 2.72229  1.00000  2787G 1038G  1748G 37.27 1.56 146
>               TOTAL 36267G 8643G 27624G 23.83
> MIN/MAX VAR: 0.44/2.37  STDDEV: 18.60
> #
> 
> # ceph osd tree
> ID WEIGHT   TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 35.41795 root default
> -2 21.81180     host node01
>  0  7.27060         osd.0       up  1.00000          1.00000
>  3  7.27060         osd.3       up  1.00000          1.00000
>  4  7.27060         osd.4       up  1.00000          1.00000
> -3  5.43929     host node02
>  1  1.81310         osd.1       up  1.00000          1.00000
>  5  1.81310         osd.5       up  1.00000          1.00000
>  6  1.81310         osd.6       up  1.00000          1.00000
> -4  8.16687     host node03
>  2  2.72229         osd.2       up  1.00000          1.00000
>  7  2.72229         osd.7       up  1.00000          1.00000
>  8  2.72229         osd.8       up  1.00000          1.00000
> #
> 
> # ceph -s
>     cluster 49ba9695-7199-4c21-9199-ac321e60065e
>      health HEALTH_OK
>      monmap e1: 3 mons at
> {ceph-mon01=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon02=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon03=[x:x:x:x:x:x:x:x]:6789/0}
>             election epoch 48, quorum 0,1,2 ceph-mon01,ceph-mon03,ceph-mon02
>      osdmap e265: 9 osds: 9 up, 9 in
>             flags sortbitwise,require_jewel_osds
>       pgmap v95701: 416 pgs, 11 pools, 2879 GB data, 729 kobjects
>             8643 GB used, 27624 GB / 36267 GB avail
>                  416 active+clean
> #
> 
> # ceph osd pool ls
> .rgw.root
> default.rgw.control
> default.rgw.data.root
> default.rgw.gc
> default.rgw.log
> default.rgw.users.uid
> default.rgw.users.keys
> default.rgw.buckets.index
> default.rgw.buckets.non-ec
> default.rgw.buckets.data
> default.rgw.users.email
> #
> 
> # ceph df
> GLOBAL:
>     SIZE       AVAIL      RAW USED     %RAW USED
>     36267G     27624G        8643G         23.83
> POOLS:
>     NAME                           ID     USED      %USED     MAX AVAIL
> OBJECTS
>     .rgw.root                      1       1588         0         5269G
>       4
>     default.rgw.control            2          0         0         5269G
>       8
>     default.rgw.data.root          3       8761         0         5269G
>      28
>     default.rgw.gc                 4          0         0         5269G
>      32
>     default.rgw.log                5          0         0         5269G
>     127
>     default.rgw.users.uid          6       4887         0         5269G
>      28
>     default.rgw.users.keys         7        144         0         5269G
>      16
>     default.rgw.buckets.index      9          0         0         5269G
>      14
>     default.rgw.buckets.non-ec     10         0         0         5269G
>       3
>     default.rgw.buckets.data       11     2879G     35.34         5269G
>  746848
>     default.rgw.users.email        12        13         0         5269G
>       1
> #
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com