Re: Data distribution question

Jack <ceph@xxxxxxxxxxxxxx> · Tue, 30 Apr 2019 20:25:28 +0200



You have a lot of useless PG, yet they have the same "weight" as the
useful ones

If those pools are useless, you can:
- drop them
- raise npr_archive's pg_num using the freed PGs

As npr_archive own 97% of your data, it should get 97% of your pg (which
is ~8000)

The balance module is still quite useful

On 04/30/2019 08:02 PM, Shain Miley wrote:
> Here is the per pool pg_num info:
> 
> 'data' pg_num 64
> 'metadata' pg_num 64
> 'rbd' pg_num 64
> 'npr_archive' pg_num 6775
> '.rgw.root' pg_num 64
> '.rgw.control' pg_num 64
> '.rgw' pg_num 64
> '.rgw.gc' pg_num 64
> '.users.uid' pg_num 64
> '.users.email' pg_num 64
> '.users' pg_num 64
> '.usage' pg_num 64
> '.rgw.buckets.index' pg_num 128
> '.intent-log' pg_num 8
> '.rgw.buckets' pg_num 64
> 'kube' pg_num 512
> '.log' pg_num 8
> 
> Here is the df output:
> 
> GLOBAL:
>     SIZE        AVAIL      RAW USED     %RAW USED
>     1.06PiB     306TiB       778TiB         71.75
> POOLS:
>     NAME                   ID     USED        %USED MAX AVAIL     OBJECTS
>     data                   0      11.7GiB      0.14 8.17TiB         3006
>     metadata               1           0B         0 8.17TiB            0
>     rbd                    2      43.2GiB      0.51 8.17TiB        11147
>     npr_archive            3       258TiB     97.93 5.45TiB     82619649
>     .rgw.root              4        1001B         0 8.17TiB            5
>     .rgw.control           5           0B         0 8.17TiB            8
>     .rgw                   6      6.16KiB         0 8.17TiB           35
>     .rgw.gc                7           0B         0 8.17TiB           32
>     .users.uid             8           0B         0 8.17TiB            0
>     .users.email           9           0B         0 8.17TiB            0
>     .users                 10          0B         0 8.17TiB            0
>     .usage                 11          0B         0 8.17TiB            1
>     .rgw.buckets.index     12          0B         0 8.17TiB           26
>     .intent-log            17          0B         0 5.45TiB            0
>     .rgw.buckets           18     24.2GiB      0.29 8.17TiB         6622
>     kube                   21     1.82GiB      0.03 5.45TiB          550
>     .log                   22          0B         0 5.45TiB          176
> 
> 
> The stuff in the data pool and the rwg pools is old data that we used
> for testing...if you guys think that removing everything outside of rbd
> and npr_archive would make a significant impact I will give it a try.
> 
> Thanks,
> 
> Shain
> 
> 
> 
> On 4/30/19 1:15 PM, Jack wrote:
>> Hi,
>>
>> I see that you are using rgw
>> RGW comes with many pools, yet most of them are used for metadata and
>> configuration, those do not store many data
>> Such pools do not need more than a couple PG, each (I use pg_num = 8)
>>
>> You need to allocate your pg on pool that actually stores the data
>>
>> Please do the following, to let us know more:
>> Print the pg_num per pool:
>> for i in $(rados lspools); do echo -n "$i: "; ceph osd pool get $i
>> pg_num; done
>>
>> Print the usage per pool:
>> ceph df
>>
>> Also, instead of doing a "ceph osd reweight-by-utilization", check out
>> the balancer plugin :
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_mimic_mgr_balancer_&d=DwICAg&c=E2nBno7hEddFhl23N5nD1Q&r=cqFccwnwHGRorPuRWs36Dw&m=1BfaF7xeFT_o8pdT9mrRmWm0gCn4wgalDi3UviTy24M&s=YoiU-wa-ZXHUEj8xYmiSVRVnXnDenoUaRZMa-bfRFvo&e=
>>
>>
>> Finally, in nautilus, the pg can now upscale and downscale automaticaly
>> See
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__ceph.com_rados_new-2Din-2Dnautilus-2Dpg-2Dmerging-2Dand-2Dautotuning_&d=DwICAg&c=E2nBno7hEddFhl23N5nD1Q&r=cqFccwnwHGRorPuRWs36Dw&m=1BfaF7xeFT_o8pdT9mrRmWm0gCn4wgalDi3UviTy24M&s=7-W9i3gJAcCtrL7MzjJlG5LZ_91zeesYBT7g0rGrLh0&e=
>>
>>
>>
>> On 04/30/2019 06:34 PM, Shain Miley wrote:
>>> Hi,
>>>
>>> We have a cluster with 235 osd's running version 12.2.11 with a
>>> combination of 4 and 6 TB drives.  The data distribution across osd's
>>> varies from 52% to 94%.
>>>
>>> I have been trying to figure out how to get this a bit more balanced as
>>> we are running into 'backfillfull' issues on a regular basis.
>>>
>>> I've tried adding more pgs...but this did not seem to do much in terms
>>> of the imbalance.
>>>
>>> Here is the end output from 'ceph osd df':
>>>
>>> MIN/MAX VAR: 0.73/1.31  STDDEV: 7.73
>>>
>>> We have 8199 pgs total with 6775 of them in the pool that has 97% of the
>>> data.
>>>
>>> The other pools are not really used (data, metadata, .rgw.root,
>>> .rgw.control, etc).  I have thought about deleting those unused pools so
>>> that most if not all the pgs are being used by the pool with the
>>> majority of the data.
>>>
>>> However...before I do that...there anything else I can do or try in
>>> order to see if I can balance out the data more uniformly?
>>>
>>> Thanks in advance,
>>>
>>> Shain
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwICAg&c=E2nBno7hEddFhl23N5nD1Q&r=cqFccwnwHGRorPuRWs36Dw&m=1BfaF7xeFT_o8pdT9mrRmWm0gCn4wgalDi3UviTy24M&s=BczlpHmYiubLlNUhgDHcEsVHAsR_RYCKYV2G_5w2Vio&e=
>>
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com