Hi Joseph,
thank you for answer. But if I'm looking correctly to 'ceph osd df'
output I posted I see there are about 195 PGs per OSD.
There are 608 OSDs in the pool, which is the only data pool. What I have
calculated - PG calc says that PG number is fine.
On 11/1/22 14:03, Joseph Mundackal wrote:
If the GB per pg is high, the balancer module won't be able to help.
Your pg count per osd also looks low (30's), so increasing pgs per
pool would help with both problems.
You can use the pg calculator to determine which pools need what
On Tue, Nov 1, 2022, 08:46 Denis Polom <denispolom@xxxxxxxxx> wrote:
Hi
I observed on my Ceph cluster running latest Pacific that same
size OSDs
are utilized differently even if balancer is running and reports
status
as perfectly balanced.
{
"active": true,
"last_optimize_duration": "0:00:00.622467",
"last_optimize_started": "Tue Nov 1 12:49:36 2022",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or
pool(s)
pg_num is decreasing, or distribution is already perfect",
"plans": []
}
balancer settings for upmap are:
mgr advanced
mgr/balancer/mode upmap
mgr advanced
mgr/balancer/upmap_max_deviation 1
mgr advanced
mgr/balancer/upmap_max_optimizations 20
It's obvious that utilization is not same (difference is about
1TB) from
command `ceph osd df`. Following is just a partial output:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP
META AVAIL %USE VAR PGS STATUS
0 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 3.0 MiB
37 GiB 3.6 TiB 78.09 1.05 196 up
124 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 32
GiB 4.7 TiB 71.20 0.96 195 up
157 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.3
MiB 35
GiB 3.7 TiB 77.67 1.05 195 up
1 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.0 MiB
35 GiB 3.7 TiB 77.69 1.05 195 up
243 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 31
GiB 4.7 TiB 71.16 0.96 195 up
244 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 31
GiB 4.7 TiB 71.19 0.96 195 up
245 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 32
GiB 4.7 TiB 71.55 0.96 196 up
246 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 31
GiB 4.7 TiB 71.17 0.96 195 up
249 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 30
GiB 4.7 TiB 71.18 0.96 195 up
500 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 30
GiB 4.7 TiB 71.19 0.96 195 up
501 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 31
GiB 4.7 TiB 71.57 0.96 196 up
502 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 31
GiB 4.7 TiB 71.18 0.96 195 up
532 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0
B 31
GiB 4.7 TiB 71.16 0.96 195 up
549 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 576
KiB 36
GiB 3.7 TiB 77.70 1.05 195 up
550 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 3.8
MiB 36
GiB 3.7 TiB 77.67 1.05 195 up
551 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.4
MiB 35
GiB 3.7 TiB 77.68 1.05 195 up
552 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.5
MiB 35
GiB 3.7 TiB 77.69 1.05 195 up
553 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.1
MiB 37
GiB 3.6 TiB 77.71 1.05 195 up
554 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 967
KiB 36
GiB 3.6 TiB 77.71 1.05 195 up
555 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 1.3
MiB 36
GiB 3.6 TiB 78.08 1.05 196 up
556 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 4.7
MiB 36
GiB 3.6 TiB 78.10 1.05 196 up
557 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.4
MiB 36
GiB 3.7 TiB 77.69 1.05 195 up
558 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 4.5
MiB 36
GiB 3.6 TiB 77.72 1.05 195 up
559 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 1.5
MiB 35
GiB 3.6 TiB 78.09 1.05 196 up
560 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.2
MiB 35
GiB 3.7 TiB 77.69 1.05 195 up
561 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.8
MiB 35
GiB 3.7 TiB 77.69 1.05 195 up
562 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 1.0
MiB 36
GiB 3.7 TiB 77.68 1.05 195 up
563 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.6
MiB 36
GiB 3.7 TiB 77.68 1.05 195 up
564 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.1
MiB 36
GiB 3.6 TiB 78.09 1.05 196 up
567 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 4.8
MiB 36
GiB 3.6 TiB 78.11 1.05 196 up
568 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.2
MiB 35
GiB 3.7 TiB 77.68 1.05 195 up
All OSDs are used by the same pool (EC)
I have the same issue on another Ceph cluster with the same setup
where
I was able to make OSDs utilization same by changing reweight from
1.00000 to lower on OSDs with higher utilization and I got a lot of
free space:
before changing reweight:
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 3.1 PiB 510 TiB 2.6 PiB 2.6 PiB 83.77
ssd 2.6 TiB 2.6 TiB 46 GiB 46 GiB 1.70
TOTAL 3.1 PiB 513 TiB 2.6 PiB 2.6 PiB 83.70
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED
MAX AVAIL
cephfs_data 3 8192 2.1 PiB 555.63M 2.6 PiB 91.02
216 TiB
cephfs_metadata 4 128 7.5 GiB 140.22k 22 GiB 0.87
851 GiB
device_health_metrics 5 1 4.1 GiB 1.15k 8.3 GiB 0
130 TiB
after changing reweight:
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 3.1 PiB 522 TiB 2.6 PiB 2.6 PiB 83.38
ssd 2.6 TiB 2.6 TiB 63 GiB 63 GiB 2.36
TOTAL 3.1 PiB 525 TiB 2.6 PiB 2.6 PiB 83.31
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED
MAX AVAIL
cephfs_data 3 8192 2.1 PiB 555.63M 2.5 PiB 86.83
330 TiB
cephfs_metadata 4 128 7.4 GiB 140.22k 22 GiB 0.87
846 GiB
device_health_metrics 5 1 4.2 GiB 1.15k 8.4 GiB 0
198 TiB
Free space I got is almost 5% what is about 100TB!
This is just workaround and I'm not happy with keeping reweight
with not
default value permanently.
Do you have any advice please, what settings can be adjusted or
should
be adjusted to keep OSDs utilization same? Because obviously balancer
upmap, not even crush-compat are working correctly at least in my
case.
Many thanks!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx