OSDs not balanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

i wonder why my autobalancer is not working here:

root@ceph01:~# ceph -s
  cluster:
    id:     5436dd5d-83d4-4dc8-a93b-60ab5db145df
    health: HEALTH_ERR
            1 backfillfull osd(s)
            1 full osd(s)
            1 nearfull osd(s)
            4 pool(s) full

=> osd.17 was too full (92% or something like that)

root@ceph01:~# ceph osd df tree
ID   CLASS  WEIGHT     REWEIGHT  SIZE     ... %USE  ... PGS TYPE NAME
-25         209.50084         -  213 TiB  ... 69.56 ...   - datacenter
xxx-dc-root
-19          84.59369         -   86 TiB  ... 56.97 ...   -     rack
RZ1.Reihe4.R10
 -3          35.49313         -   37 TiB  ... 57.88 ...   -         host ceph02
  2    hdd    1.70000   1.00000  1.7 TiB  ... 58.77 ...  44             osd.2
  3    hdd    1.00000   1.00000  2.7 TiB  ... 22.14 ...  25             osd.3
  7    hdd    2.50000   1.00000  2.7 TiB  ... 58.84 ...  70             osd.7
  9    hdd    9.50000   1.00000  9.5 TiB  ... 63.07 ... 268             osd.9
 13    hdd    2.67029   1.00000  2.7 TiB  ... 53.59 ...  65             osd.13
 16    hdd    2.89999   1.00000  2.7 TiB  ... 59.35 ...  71             osd.16
 19    hdd    1.70000   1.00000  1.7 TiB  ... 48.98 ...  37             osd.19
 23    hdd    2.38419   1.00000  2.4 TiB  ... 59.33 ...  64             osd.23
 24    hdd    1.39999   1.00000  1.7 TiB  ... 51.23 ...  39             osd.24
 28    hdd    3.63869   1.00000  3.6 TiB  ... 64.17 ... 104             osd.28
 31    hdd    2.70000   1.00000  2.7 TiB  ... 64.73 ...  76             osd.31
 32    hdd    3.39999   1.00000  3.3 TiB  ... 67.28 ... 101             osd.32
 -9          22.88817         -   23 TiB  ... 56.96 ...   -         host ceph06
 35    hdd    7.15259   1.00000  7.2 TiB  ... 55.71 ... 182             osd.35
 36    hdd    5.24519   1.00000  5.2 TiB  ... 53.75 ... 128             osd.36
 45    hdd    5.24519   1.00000  5.2 TiB  ... 60.91 ... 144             osd.45
 48    hdd    5.24519   1.00000  5.2 TiB  ... 57.94 ... 139             osd.48
-17          26.21239         -   26 TiB  ... 55.67 ...   -         host ceph08
 37    hdd    6.67569   1.00000  6.7 TiB  ... 58.17 ... 174             osd.37
 40    hdd    9.53670   1.00000  9.5 TiB  ... 58.54 ... 250             osd.40
 46    hdd    5.00000   1.00000  5.0 TiB  ... 52.39 ... 116             osd.46
 47    hdd    5.00000   1.00000  5.0 TiB  ... 50.05 ... 112             osd.47
-20          59.11053         -   60 TiB  ... 82.47 ...   -     rack
RZ1.Reihe4.R9
 -4          23.09996         -   24 TiB  ... 79.92 ...   -         host ceph03
  5    hdd    1.70000   0.75006  1.7 TiB  ... 87.24 ...  66             osd.5
  6    hdd    1.70000   0.44998  1.7 TiB  ... 47.30 ...  36             osd.6
 10    hdd    2.70000   0.85004  2.7 TiB  ... 83.23 ... 100             osd.10
 15    hdd    2.70000   0.75006  2.7 TiB  ... 74.26 ...  88             osd.15
 17    hdd    0.50000   0.85004  1.6 TiB  ... 91.44 ...  67             osd.17
 20    hdd    2.00000   0.85004  1.7 TiB  ... 88.41 ...  68             osd.20
 21    hdd    2.79999   0.75006  2.7 TiB  ... 77.25 ...  91             osd.21
 25    hdd    1.70000   0.90002  1.7 TiB  ... 78.31 ...  60             osd.25
 26    hdd    2.70000   1.00000  2.7 TiB  ... 82.75 ...  99             osd.26
 27    hdd    2.70000   0.90002  2.7 TiB  ... 84.26 ... 101             osd.27
 63    hdd    1.89999   0.90002  1.7 TiB  ... 84.15 ...  65             osd.63
-13          36.01057         -   36 TiB  ... 84.12 ...   -         host ceph05
 11    hdd    7.15259   0.90002  7.2 TiB  ... 85.45 ... 273             osd.11
 39    hdd    7.20000   0.85004  7.2 TiB  ... 80.90 ... 257             osd.39
 41    hdd    7.20000   0.75006  7.2 TiB  ... 74.95 ... 239             osd.41
 42    hdd    9.00000   1.00000  9.5 TiB  ... 92.00 ... 392             osd.42
 43    hdd    5.45799   1.00000  5.5 TiB  ... 84.84 ... 207             osd.43
-21          65.79662         -   66 TiB  ... 74.29 ...   -     rack
RZ3.Reihe3.R10
 -2          28.49664         -   29 TiB  ... 74.79 ...   -         host ceph01
  0    hdd    2.70000   1.00000  2.7 TiB  ... 73.82 ...  88             osd.0
  1    hdd    3.63869   1.00000  3.6 TiB  ... 73.47 ... 121             osd.1
  4    hdd    2.70000   1.00000  2.7 TiB  ... 74.63 ...  89             osd.4
  8    hdd    2.70000   1.00000  2.7 TiB  ... 77.10 ...  92             osd.8
 12    hdd    2.70000   1.00000  2.7 TiB  ... 78.76 ...  94             osd.12
 14    hdd    5.45799   1.00000  5.5 TiB  ... 78.86 ... 193             osd.14
 18    hdd    1.89999   1.00000  2.7 TiB  ... 63.79 ...  76             osd.18
 22    hdd    1.70000   1.00000  1.7 TiB  ... 74.85 ...  57             osd.22
 30    hdd    1.70000   1.00000  1.7 TiB  ... 76.34 ...  59             osd.30
 64    hdd    3.29999   1.00000  3.3 TiB  ... 73.48 ... 110             osd.64
-11          12.39999         -   12 TiB  ... 73.40 ...   -         host ceph04
 34    hdd    5.20000   1.00000  5.2 TiB  ... 72.81 ... 171             osd.34
 44    hdd    7.20000   1.00000  7.2 TiB  ... 73.83 ... 236             osd.44
-15          24.89998         -   25 TiB  ... 74.15 ...   -         host ceph07
 66    hdd    7.20000   1.00000  7.2 TiB  ... 74.07 ... 236             osd.66
 67    hdd    7.20000   1.00000  7.2 TiB  ... 73.74 ... 236             osd.67
 68    hdd    3.29999   1.00000  3.3 TiB  ... 72.99 ... 110             osd.68
 69    hdd    7.20000   1.00000  7.2 TiB  ... 75.18 ... 241             osd.69
 -1                 0         -      0 B  ...     0 ...   - root default
                          TOTAL  213 TiB  ... 69.56

root@ceph01:~# ceph balancer status
{
    "active": true,
    "last_optimize_duration": "0:00:00.256761",
    "last_optimize_started": "Mon Mar  4 10:25:10 2024",
    "mode": "upmap",
    "no_optimization_needed": true,
    "optimize_result": "Unable to find further optimization, or
pool(s) pg_num is decreasing, or distribution is already perfect",
    "plans": []
}


Where as reweight-by-utilization would change quite a bit!:

root@ceph01:~# ceph osd test-reweight-by-utilization 110 .5 10
moved 260 / 6627 (3.92334%)
avg 127.442
stddev 81.345 -> 78.169 (expected baseline 11.18)
min osd.17 with 24 -> 16 pgs (0.188321 -> 0.125547 * mean)
max osd.42 with 401 -> 320 pgs (3.14652 -> 2.51094 * mean)

oload 110
max_change 0.5
max_change_osds 10
average_utilization 0.6956
overload_utilization 0.7652
osd.42 weight 1.0000 -> 0.7561
osd.6 weight 0.4500 -> 0.6616
osd.17 weight 0.8500 -> 0.6466
osd.20 weight 0.8500 -> 0.6688
osd.5 weight 0.7501 -> 0.6004
osd.11 weight 0.9000 -> 0.7326
osd.43 weight 1.0000 -> 0.8199
osd.27 weight 0.9000 -> 0.7430
osd.63 weight 0.9000 -> 0.7440
osd.10 weight 0.8500 -> 0.7104
no change



root@ceph01:~# ceph versions
{
    "mon": {
        "ceph version 17.2.7
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.7
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.7
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)": 52
    },
    "mds": {
        "ceph version 17.2.7
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)": 4
    },
    "overall": {
        "ceph version 17.2.7
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)": 61
    }
}


Cheers,
Michael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux