If the GB per pg is high, the balancer module won't be able to help. Your pg count per osd also looks low (30's), so increasing pgs per pool would help with both problems. You can use the pg calculator to determine which pools need what On Tue, Nov 1, 2022, 08:46 Denis Polom <denispolom@xxxxxxxxx> wrote: > Hi > > I observed on my Ceph cluster running latest Pacific that same size OSDs > are utilized differently even if balancer is running and reports status > as perfectly balanced. > > { > "active": true, > "last_optimize_duration": "0:00:00.622467", > "last_optimize_started": "Tue Nov 1 12:49:36 2022", > "mode": "upmap", > "optimize_result": "Unable to find further optimization, or pool(s) > pg_num is decreasing, or distribution is already perfect", > "plans": [] > } > > balancer settings for upmap are: > > mgr advanced > mgr/balancer/mode upmap > mgr advanced mgr/balancer/upmap_max_deviation 1 > mgr advanced mgr/balancer/upmap_max_optimizations > 20 > > It's obvious that utilization is not same (difference is about 1TB) from > command `ceph osd df`. Following is just a partial output: > > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP > META AVAIL %USE VAR PGS STATUS > 0 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 3.0 MiB > 37 GiB 3.6 TiB 78.09 1.05 196 up > 124 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 32 > GiB 4.7 TiB 71.20 0.96 195 up > 157 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.3 MiB 35 > GiB 3.7 TiB 77.67 1.05 195 up > 1 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.0 MiB > 35 GiB 3.7 TiB 77.69 1.05 195 up > 243 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 31 > GiB 4.7 TiB 71.16 0.96 195 up > 244 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 31 > GiB 4.7 TiB 71.19 0.96 195 up > 245 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 32 > GiB 4.7 TiB 71.55 0.96 196 up > 246 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 31 > GiB 4.7 TiB 71.17 0.96 195 up > 249 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 30 > GiB 4.7 TiB 71.18 0.96 195 up > 500 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 30 > GiB 4.7 TiB 71.19 0.96 195 up > 501 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 31 > GiB 4.7 TiB 71.57 0.96 196 up > 502 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 31 > GiB 4.7 TiB 71.18 0.96 195 up > 532 hdd 18.00020 1.00000 16 TiB 12 TiB 12 TiB 0 B 31 > GiB 4.7 TiB 71.16 0.96 195 up > 549 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 576 KiB 36 > GiB 3.7 TiB 77.70 1.05 195 up > 550 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 3.8 MiB 36 > GiB 3.7 TiB 77.67 1.05 195 up > 551 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.4 MiB 35 > GiB 3.7 TiB 77.68 1.05 195 up > 552 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.5 MiB 35 > GiB 3.7 TiB 77.69 1.05 195 up > 553 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.1 MiB 37 > GiB 3.6 TiB 77.71 1.05 195 up > 554 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 967 KiB 36 > GiB 3.6 TiB 77.71 1.05 195 up > 555 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 1.3 MiB 36 > GiB 3.6 TiB 78.08 1.05 196 up > 556 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 4.7 MiB 36 > GiB 3.6 TiB 78.10 1.05 196 up > 557 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.4 MiB 36 > GiB 3.7 TiB 77.69 1.05 195 up > 558 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 4.5 MiB 36 > GiB 3.6 TiB 77.72 1.05 195 up > 559 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 1.5 MiB 35 > GiB 3.6 TiB 78.09 1.05 196 up > 560 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.2 MiB 35 > GiB 3.7 TiB 77.69 1.05 195 up > 561 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.8 MiB 35 > GiB 3.7 TiB 77.69 1.05 195 up > 562 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 1.0 MiB 36 > GiB 3.7 TiB 77.68 1.05 195 up > 563 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 2.6 MiB 36 > GiB 3.7 TiB 77.68 1.05 195 up > 564 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.1 MiB 36 > GiB 3.6 TiB 78.09 1.05 196 up > 567 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 4.8 MiB 36 > GiB 3.6 TiB 78.11 1.05 196 up > 568 hdd 18.00020 1.00000 16 TiB 13 TiB 13 TiB 5.2 MiB 35 > GiB 3.7 TiB 77.68 1.05 195 up > > All OSDs are used by the same pool (EC) > > I have the same issue on another Ceph cluster with the same setup where > I was able to make OSDs utilization same by changing reweight from > 1.00000 to lower on OSDs with higher utilization and I got a lot of > free space: > > before changing reweight: > > --- RAW STORAGE --- > CLASS SIZE AVAIL USED RAW USED %RAW USED > hdd 3.1 PiB 510 TiB 2.6 PiB 2.6 PiB 83.77 > ssd 2.6 TiB 2.6 TiB 46 GiB 46 GiB 1.70 > TOTAL 3.1 PiB 513 TiB 2.6 PiB 2.6 PiB 83.70 > > --- POOLS --- > POOL ID PGS STORED OBJECTS USED %USED MAX > AVAIL > cephfs_data 3 8192 2.1 PiB 555.63M 2.6 PiB 91.02 216 > TiB > cephfs_metadata 4 128 7.5 GiB 140.22k 22 GiB 0.87 851 > GiB > device_health_metrics 5 1 4.1 GiB 1.15k 8.3 GiB 0 130 > TiB > > > after changing reweight: > --- RAW STORAGE --- > CLASS SIZE AVAIL USED RAW USED %RAW USED > hdd 3.1 PiB 522 TiB 2.6 PiB 2.6 PiB 83.38 > ssd 2.6 TiB 2.6 TiB 63 GiB 63 GiB 2.36 > TOTAL 3.1 PiB 525 TiB 2.6 PiB 2.6 PiB 83.31 > > --- POOLS --- > POOL ID PGS STORED OBJECTS USED %USED MAX > AVAIL > cephfs_data 3 8192 2.1 PiB 555.63M 2.5 PiB 86.83 330 > TiB > cephfs_metadata 4 128 7.4 GiB 140.22k 22 GiB 0.87 846 > GiB > device_health_metrics 5 1 4.2 GiB 1.15k 8.4 GiB 0 198 > TiB > > Free space I got is almost 5% what is about 100TB! > > This is just workaround and I'm not happy with keeping reweight with not > default value permanently. > > Do you have any advice please, what settings can be adjusted or should > be adjusted to keep OSDs utilization same? Because obviously balancer > upmap, not even crush-compat are working correctly at least in my case. > > Many thanks! > > > > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx