Turn off the autoscaler and increase pg_num to 512 or so (power of 2).
The recommendation is to have between 100 and 150 PGs per OSD (incl.
replicas). And then let the balancer handle the rest. What is the
current balancer status (ceph balancer status)?
Zitat von Spiros Papageorgiou <papage@xxxxxxxxxxx>:
Hi all,
I have a ceph cluster with 3 nodes. ceph version is 16.2.9. There
are 7 SSD OSDs on each server and one pool that resides on these OSDs.
My OSDs are terribly unbalanced:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP
META AVAIL %USE VAR PGS STATUS TYPE NAME
-9 28.42200 - 28 TiB 9.3 TiB 9.2 TiB 161 MiB
26 GiB 19 TiB 32.56 1.09 - root ssddisks
-2 9.47400 - 9.5 TiB 3.4 TiB 3.4 TiB 66 MiB
9.2 GiB 6.1 TiB 35.52 1.19 - host px1-ssd
0 ssd 1.74599 0.85004 1.7 TiB 810 GiB 807 GiB 3.2 MiB
2.3 GiB 978 GiB 45.28 1.51 26 up osd.0
5 ssd 0.82999 0.85004 850 GiB 581 GiB 580 GiB 22 MiB
912 MiB 269 GiB 68.38 2.29 19 up osd.5
6 ssd 0.82999 1.00000 850 GiB 8.2 GiB 7.8 GiB 9.5 MiB
435 MiB 842 GiB 0.97 0.03 4 up osd.6
7 ssd 0.82999 1.00000 850 GiB 294 GiB 293 GiB 26 MiB
591 MiB 556 GiB 34.60 1.16 11 up osd.7
16 ssd 1.74599 0.85004 1.7 TiB 872 GiB 869 GiB 3.1 MiB
2.3 GiB 916 GiB 48.75 1.63 27 up osd.16
23 ssd 1.74599 1.00000 1.7 TiB 438 GiB 436 GiB 1.5 MiB
1.7 GiB 1.3 TiB 24.48 0.82 14 up osd.23
24 ssd 1.74599 1.00000 1.7 TiB 444 GiB 443 GiB 1.6 MiB
1.0 GiB 1.3 TiB 24.81 0.83 17 up osd.24
-6 9.47400 - 9.5 TiB 2.9 TiB 2.9 TiB 46 MiB
8.1 GiB 6.6 TiB 30.39 1.02 - host px2-ssd
12 ssd 0.82999 1.00000 850 GiB 154 GiB 154 GiB 21 MiB
368 MiB 696 GiB 18.16 0.61 9 up osd.12
13 ssd 0.82999 1.00000 850 GiB 144 GiB 143 GiB 527 KiB
469 MiB 706 GiB 16.92 0.57 4 up osd.13
14 ssd 0.82999 1.00000 850 GiB 149 GiB 149 GiB 16 MiB
299 MiB 700 GiB 17.58 0.59 7 up osd.14
29 ssd 1.74599 1.00000 1.7 TiB 449 GiB 448 GiB 1.6 MiB
1.4 GiB 1.3 TiB 25.11 0.84 20 up osd.29
30 ssd 1.74599 0.85004 1.7 TiB 885 GiB 882 GiB 3.1 MiB
2.3 GiB 903 GiB 49.48 1.65 31 up osd.30
31 ssd 1.74599 1.00000 1.7 TiB 728 GiB 727 GiB 2.6 MiB
1.8 GiB 1.0 TiB 40.74 1.36 22 up osd.31
32 ssd 1.74599 1.00000 1.7 TiB 438 GiB 437 GiB 1.6 MiB
1.4 GiB 1.3 TiB 24.51 0.82 15 up osd.32
-4 9.47400 - 9.5 TiB 3.0 TiB 3.0 TiB 49 MiB
8.7 GiB 6.5 TiB 31.78 1.06 - host px3-ssd
19 ssd 0.82999 1.00000 850 GiB 293 GiB 292 GiB 14 MiB
500 MiB 557 GiB 34.47 1.15 9 up osd.19
20 ssd 0.82999 1.00000 850 GiB 290 GiB 290 GiB 10 MiB
482 MiB 560 GiB 34.15 1.14 10 up osd.20
21 ssd 0.82999 1.00000 850 GiB 148 GiB 147 GiB 16 MiB
428 MiB 702 GiB 17.36 0.58 5 up osd.21
25 ssd 1.74599 1.00000 1.7 TiB 446 GiB 445 GiB 1.8 MiB
1.6 GiB 1.3 TiB 24.96 0.83 19 up osd.25
26 ssd 1.74599 1.00000 1.7 TiB 739 GiB 737 GiB 2.6 MiB
2.0 GiB 1.0 TiB 41.33 1.38 29 up osd.26
27 ssd 1.74599 1.00000 1.7 TiB 725 GiB 723 GiB 2.6 MiB
2.1 GiB 1.0 TiB 40.55 1.36 21 up osd.27
28 ssd 1.74599 1.00000 1.7 TiB 442 GiB 440 GiB 1.6 MiB
1.7 GiB 1.3 TiB 24.72 0.83 17 up osd.28
I have done a "ceph osd reweight-by-utilization" and "ceph osd
set-require-min-compat-client luminous". The pool has 32 PGs which
were set by autoscale_mode, which is on.
Why are my OSDs, so unbalanced? I have osd.5 with 68.3% and osd.6
with 0.97%.... Also when the reweight-by-utilization, osd.5
utilization actually increased...
What am i missing here?
Sp
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx