Re: Balancing PGs across OSDs

Thomas Schneider <74cmonty@xxxxxxxxx> · Tue, 19 Nov 2019 10:01:02 +0100

Hello Paul,

thanks for your analysis.

I want to share more statistics of my cluster to follow-up on your
response "You have way too few PGs in one of the roots".

Here are the pool details:
root@ld3955:~# ceph osd pool ls detail
pool 11 'hdb_backup' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode warn
last_change 294572 flags hashpspool,selfmanaged_snaps stripe_width 0
application rbd
        removed_snaps [1~3]
pool 59 'hdd' replicated size 2 min_size 2 crush_rule 3 object_hash
rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 267271
flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
        removed_snaps [1~3]
pool 60 'ssd' replicated size 2 min_size 2 crush_rule 4 object_hash
rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 299719
lfor 299717/299717/299717 flags hashpspool,selfmanaged_snaps
stripe_width 0 application rbd
        removed_snaps [1~3]
pool 61 'nvme' replicated size 2 min_size 2 crush_rule 2 object_hash
rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 267125 flags
hashpspool stripe_width 0 application rbd
pool 62 'cephfs_data' replicated size 3 min_size 2 crush_rule 3
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn
last_change 300312 lfor 300310/300310/300310 flags hashpspool
stripe_width 0 application cephfs
pool 63 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 3
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change
267069 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16
recovery_priority 5 application cephfs

Any pg_num / pgp_num is monitored by Ceph, means I get a warning in the
log / health status if a pool is undersized.
I didn't enable PG auto-scaler for any pool, though.
The calculation of PGs per Pool is done with pgcalc
<https://ceph.io/pgcalc/>.
Here's a screenshot <https://ibb.co/VjR6X3x> of this calculation.

My focus is on pool hdb_backup.
Based on these statistics
root@ld3955:~# ceph df detail
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       1.4 PiB     744 TiB     729 TiB      730 TiB         49.53
    nvme       23 TiB      23 TiB      43 GiB       51 GiB          0.22
    ssd        27 TiB      25 TiB     1.9 TiB      1.9 TiB          7.15
    TOTAL     1.5 PiB     792 TiB     731 TiB      732 TiB         48.02

POOLS:
    POOL                ID     STORED      OBJECTS     USED       
%USED     MAX AVAIL     QUOTA OBJECTS     QUOTA BYTES     DIRTY      
USED COMPR     UNDER COMPR
    hdb_backup          11     241 TiB      63.29M     241 TiB    
57.03        61 TiB     N/A               N/A              63.29M
0 B             0 B
    hdd                 59     553 GiB     142.16k     553 GiB     
0.50        54 TiB     N/A               N/A             142.16k
0 B             0 B
    ssd                 60     2.0 TiB     530.75k     2.0 TiB     
8.72        10 TiB     N/A               N/A             530.75k
0 B             0 B
    nvme                61         0 B           0         0 B        
0        11 TiB     N/A               N/A                   0
0 B             0 B
    cephfs_data         62     356 GiB     102.29k     356 GiB     
0.32        36 TiB     N/A               N/A             102.29k
0 B             0 B
    cephfs_metadata     63     117 MiB          52     117 MiB        
0        36 TiB     N/A               N/A                  52
0 B             0 B

there's only 57% used, but effectively I cannot store much more data
because some OSDs are filling up by +80%.

It is true that the disks that are used for this pool exclusively are
different in size, means
3x 48 disks à 7.2TB
4x 48 disks à 1.6TB
and the disk usage is for 7.2TB disk from 41% to 54% and for 1.6TB disks
from 52% to 81%.

If Ceph is not cabable to manage rebalancing automatically, how can I
proceed to rebalance the data manually?
OSD reweight is not an option in my opinion because it starts filling
OSDs that are not with lowest usage rate.
Can I move PGs to specific OSDs?

THX

Am 18.11.2019 um 20:18 schrieb Paul Emmerich:
> You have way too few PGs in one of the roots. Many OSDs have so few
> PGs that you should see a lot of health warnings because of it.
> The other root has a factor 5 difference in disk size which isn't ideal either.
>
>
> Paul
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx