OSDs are not utilized evenly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I observed on my Ceph cluster running latest Pacific that same size OSDs are utilized differently even if balancer is running and reports status as perfectly balanced.

{
    "active": true,
    "last_optimize_duration": "0:00:00.622467",
    "last_optimize_started": "Tue Nov  1 12:49:36 2022",
    "mode": "upmap",
    "optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
    "plans": []
}

balancer settings for upmap are:

  mgr           advanced mgr/balancer/mode                               upmap
  mgr           advanced mgr/balancer/upmap_max_deviation                1
  mgr           advanced mgr/balancer/upmap_max_optimizations            20

It's obvious that utilization is not same (difference is about 1TB) from command `ceph osd df`. Following is just a partial output:

ID   CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA OMAP      META     AVAIL    %USE   VAR   PGS  STATUS   0    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   3.0 MiB   37 GiB  3.6 TiB  78.09  1.05  196      up 124    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   32 GiB  4.7 TiB  71.20  0.96  195      up 157    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   5.3 MiB   35 GiB  3.7 TiB  77.67  1.05  195      up   1    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   2.0 MiB   35 GiB  3.7 TiB  77.69  1.05  195      up 243    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   31 GiB  4.7 TiB  71.16  0.96  195      up 244    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   31 GiB  4.7 TiB  71.19  0.96  195      up 245    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   32 GiB  4.7 TiB  71.55  0.96  196      up 246    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   31 GiB  4.7 TiB  71.17  0.96  195      up 249    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   30 GiB  4.7 TiB  71.18  0.96  195      up 500    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   30 GiB  4.7 TiB  71.19  0.96  195      up 501    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   31 GiB  4.7 TiB  71.57  0.96  196      up 502    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   31 GiB  4.7 TiB  71.18  0.96  195      up 532    hdd  18.00020   1.00000   16 TiB   12 TiB   12 TiB       0 B   31 GiB  4.7 TiB  71.16  0.96  195      up 549    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   576 KiB   36 GiB  3.7 TiB  77.70  1.05  195      up 550    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   3.8 MiB   36 GiB  3.7 TiB  77.67  1.05  195      up 551    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   2.4 MiB   35 GiB  3.7 TiB  77.68  1.05  195      up 552    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   5.5 MiB   35 GiB  3.7 TiB  77.69  1.05  195      up 553    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   5.1 MiB   37 GiB  3.6 TiB  77.71  1.05  195      up 554    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   967 KiB   36 GiB  3.6 TiB  77.71  1.05  195      up 555    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   1.3 MiB   36 GiB  3.6 TiB  78.08  1.05  196      up 556    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   4.7 MiB   36 GiB  3.6 TiB  78.10  1.05  196      up 557    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   2.4 MiB   36 GiB  3.7 TiB  77.69  1.05  195      up 558    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   4.5 MiB   36 GiB  3.6 TiB  77.72  1.05  195      up 559    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   1.5 MiB   35 GiB  3.6 TiB  78.09  1.05  196      up 560    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   5.2 MiB   35 GiB  3.7 TiB  77.69  1.05  195      up 561    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   2.8 MiB   35 GiB  3.7 TiB  77.69  1.05  195      up 562    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   1.0 MiB   36 GiB  3.7 TiB  77.68  1.05  195      up 563    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   2.6 MiB   36 GiB  3.7 TiB  77.68  1.05  195      up 564    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   5.1 MiB   36 GiB  3.6 TiB  78.09  1.05  196      up 567    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   4.8 MiB   36 GiB  3.6 TiB  78.11  1.05  196      up 568    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB   5.2 MiB   35 GiB  3.7 TiB  77.68  1.05  195      up

All OSDs are used by the same pool (EC)

I have the same issue on another Ceph cluster with the same setup where I was able to make OSDs utilization same by changing reweight from 1.00000  to lower on OSDs with higher utilization and I got a lot of free space:

before changing reweight:

--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    3.1 PiB  510 TiB  2.6 PiB   2.6 PiB      83.77
ssd    2.6 TiB  2.6 TiB   46 GiB    46 GiB       1.70
TOTAL  3.1 PiB  513 TiB  2.6 PiB   2.6 PiB      83.70

--- POOLS ---
POOL                   ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
cephfs_data             3  8192  2.1 PiB  555.63M  2.6 PiB  91.02    216 TiB
cephfs_metadata         4   128  7.5 GiB  140.22k   22 GiB   0.87    851 GiB
device_health_metrics   5     1  4.1 GiB    1.15k  8.3 GiB      0    130 TiB


after changing reweight:
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    3.1 PiB  522 TiB  2.6 PiB   2.6 PiB      83.38
ssd    2.6 TiB  2.6 TiB   63 GiB    63 GiB       2.36
TOTAL  3.1 PiB  525 TiB  2.6 PiB   2.6 PiB      83.31

--- POOLS ---
POOL                   ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
cephfs_data             3  8192  2.1 PiB  555.63M  2.5 PiB  86.83    330 TiB
cephfs_metadata         4   128  7.4 GiB  140.22k   22 GiB   0.87    846 GiB
device_health_metrics   5     1  4.2 GiB    1.15k  8.4 GiB      0    198 TiB

Free space I got is almost 5% what is about 100TB!

This is just workaround and I'm not happy with keeping reweight with not default value permanently.

Do you have any advice please, what settings can be adjusted or should be adjusted to keep OSDs utilization same? Because obviously balancer upmap, not even crush-compat are working correctly at least in my case.

Many thanks!







_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux