If you want to resolve your issue without purchasing another node, you should move one disk of each size into each server. This process will be quite painful as you'll need to actually move the disks in the crush map to be under a different host and then all of your data will move around, but then your weights will be able to utilize the weights and distribute the data between the 2TB, 3TB, and 8TB drives much more evenly.
On Mon, Jun 5, 2017 at 9:21 AM Loic Dachary <loic@xxxxxxxxxxx> wrote:
On 06/05/2017 02:48 PM, Christian Balzer wrote:
>
> Hello,
>
> On Mon, 5 Jun 2017 13:54:02 +0200 Félix Barbeira wrote:
>
>> Hi,
>>
>> We have a small cluster for radosgw use only. It has three nodes, witch 3
> ^^^^^ ^^^^^
>> osds each. Each node has different disk sizes:
>>
>
> There's your answer, staring you right in the face.
>
> Your default replication size is 3, your default failure domain is host.
>
> Ceph can not distribute data according to the weight, since it needs to be
> on a different node (one replica per node) to comply with the replica size.
Another way to look at it is to imagine a situation where 10TB worth of data
is stored on node01 which has 8x3 24TB. Since you asked for 3 replicas, this
data must be replicated to node02 but ... there only is 2x3 6TB available.
So the maximum you can store is 6TB and remaining disk space on node01 and node03
will never be used.
python-crush analyze will display a message about that situation and show which buckets
are overweighted.
Cheers
>
> If your cluster had 4 or more nodes, you'd see what you expected.
> And most likely wouldn't be happy about the performance with your 8TB HDDs
> seeing 4 times more I/Os than then 2TB ones and thus becoming the
> bottleneck of your cluster.
>
> Christian
>
>> node01 : 3x8TB
>> node02 : 3x2TB
>> node03 : 3x3TB
>>
>> I thought that the weight handle the amount of data that every osd receive.
>> In this case for example the node with the 8TB disks should receive more
>> than the rest, right? All of them receive the same amount of data and the
>> smaller disk (2TB) reaches 100% before the bigger ones. Am I doing
>> something wrong?
>>
>> The cluster is jewel LTS 10.2.7.
>>
>> # ceph osd df
>> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
>> 0 7.27060 1.00000 7445G 1012G 6432G 13.60 0.57 133
>> 3 7.27060 1.00000 7445G 1081G 6363G 14.52 0.61 163
>> 4 7.27060 1.00000 7445G 787G 6657G 10.58 0.44 120
>> 1 1.81310 1.00000 1856G 1047G 809G 56.41 2.37 143
>> 5 1.81310 1.00000 1856G 956G 899G 51.53 2.16 143
>> 6 1.81310 1.00000 1856G 877G 979G 47.24 1.98 130
>> 2 2.72229 1.00000 2787G 1010G 1776G 36.25 1.52 140
>> 7 2.72229 1.00000 2787G 831G 1955G 29.83 1.25 130
>> 8 2.72229 1.00000 2787G 1038G 1748G 37.27 1.56 146
>> TOTAL 36267G 8643G 27624G 23.83
>> MIN/MAX VAR: 0.44/2.37 STDDEV: 18.60
>> #
>>
>> # ceph osd tree
>> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 35.41795 root default
>> -2 21.81180 host node01
>> 0 7.27060 osd.0 up 1.00000 1.00000
>> 3 7.27060 osd.3 up 1.00000 1.00000
>> 4 7.27060 osd.4 up 1.00000 1.00000
>> -3 5.43929 host node02
>> 1 1.81310 osd.1 up 1.00000 1.00000
>> 5 1.81310 osd.5 up 1.00000 1.00000
>> 6 1.81310 osd.6 up 1.00000 1.00000
>> -4 8.16687 host node03
>> 2 2.72229 osd.2 up 1.00000 1.00000
>> 7 2.72229 osd.7 up 1.00000 1.00000
>> 8 2.72229 osd.8 up 1.00000 1.00000
>> #
>>
>> # ceph -s
>> cluster 49ba9695-7199-4c21-9199-ac321e60065e
>> health HEALTH_OK
>> monmap e1: 3 mons at
>> {ceph-mon01=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon02=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon03=[x:x:x:x:x:x:x:x]:6789/0}
>> election epoch 48, quorum 0,1,2 ceph-mon01,ceph-mon03,ceph-mon02
>> osdmap e265: 9 osds: 9 up, 9 in
>> flags sortbitwise,require_jewel_osds
>> pgmap v95701: 416 pgs, 11 pools, 2879 GB data, 729 kobjects
>> 8643 GB used, 27624 GB / 36267 GB avail
>> 416 active+clean
>> #
>>
>> # ceph osd pool ls
>> .rgw.root
>> default.rgw.control
>> default.rgw.data.root
>> default.rgw.gc
>> default.rgw.log
>> default.rgw.users.uid
>> default.rgw.users.keys
>> default.rgw.buckets.index
>> default.rgw.buckets.non-ec
>> default.rgw.buckets.data
>> default.rgw.users.email
>> #
>>
>> # ceph df
>> GLOBAL:
>> SIZE AVAIL RAW USED %RAW USED
>> 36267G 27624G 8643G 23.83
>> POOLS:
>> NAME ID USED %USED MAX AVAIL
>> OBJECTS
>> .rgw.root 1 1588 0 5269G
>> 4
>> default.rgw.control 2 0 0 5269G
>> 8
>> default.rgw.data.root 3 8761 0 5269G
>> 28
>> default.rgw.gc 4 0 0 5269G
>> 32
>> default.rgw.log 5 0 0 5269G
>> 127
>> default.rgw.users.uid 6 4887 0 5269G
>> 28
>> default.rgw.users.keys 7 144 0 5269G
>> 16
>> default.rgw.buckets.index 9 0 0 5269G
>> 14
>> default.rgw.buckets.non-ec 10 0 0 5269G
>> 3
>> default.rgw.buckets.data 11 2879G 35.34 5269G
>> 746848
>> default.rgw.users.email 12 13 0 5269G
>> 1
>> #
>>
>
>
--
Loïc Dachary, Artisan Logiciel Libre
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com