I've created a new pool "testpool" and Max avail is equal. I think you're right. Somehow 3.5TiB is very lucky number because it was the same before adding new osds. POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL prod.rgw.buckets.index 54 128 622 GiB 435.60k 622 GiB 5.52 3.5 TiB testpool 60 4 0 B 0 0 B 0 3.5 TiB When I add 10 more OSDs, I've increased PG count to double. Thats why PG counts are same but size is very differ. So not only PG's have moved around, also objects are should balance but crush-compat didn't do that well. Upmap is good when it comes PG balance, PG counts will be equal but the objects in PGs are differ and thats will make imbalance again. Upmap wasn't good at that either if I remember correctly. But it worth to give a try. In this weekend; 1- I will run Compaction on every SSD OSD's. 2- Deep-scrub to get rid of Large_OMAPS 3- Switch Balancer from Crush-compat to Upmap 4- If its not balanced as I desired I will go manual. Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>, 1 Eyl 2021 Çar, 17:35 tarihinde şunu yazdı: > Well, if I read your df output correctly, your emptiest SSD is 22% > full and the fullest is 55%. That's a huge spread, and max-avail takes > imbalances like this into account. Assuming triple-replication, a > max-avail of 3.5 TiB makes sense given this imbalance. > > Similarly, OSD 209 has 65 PGs on it, whereas some OSDs have 104 PGs on > them. At least within a host, the upmap balancer should be capable of > bringing the PG count within a variance of 1 PG or so (if configured > to do so). I admit that I don't have experience with class-based > rules, though, so it's possible that the upmap balancer also has > limitations when it comes to this sort of rule. > > Having said that, PG count imbalance isn't the only source of > imbalance here; OSD 19 is 45% full with 102 PGs whereas OSD 208 is 23% > with 95 PGs; it would seem that there is a data imbalance on a per-PG > basis, or perhaps that's just an OSD needing compaction to clean up > some space or something like that. Not sure. > > Josh > > On Wed, Sep 1, 2021 at 8:19 AM mhnx <morphinwithyou@xxxxxxxxx> wrote: > > > > I've tried upmap balancer but pg distribution was not good. I think > Crush-compat is way better with Nautilus and also its the default option. > As you can see at df tree pg distribution is not that bad. Also max avail > never changed. > > > > If its not the reason for max avail then what it is? > > > > Maybe I should create a new pool and check its max avail. If the New > pool max avail greater then other pools then ceph recalculation is bugy or > need to triger somehow. > > > > 1 Eyl 2021 Çar 17:07 tarihinde Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> > şunu yazdı: > >> > >> Googling for that balancer error message, I came across > >> https://tracker.ceph.com/issues/22814, which was closed/wont-fix, and > >> some threads that claimed that class-based crush rules actually use > >> some form of shadow trees in the background. I'm not sure how accurate > >> that is. > >> > >> The only suggestion I have, which is what was also suggested in one of > >> the above threads, is to use the upmap balancer instead if possible. > >> > >> Josh > >> > >> On Wed, Sep 1, 2021 at 2:38 AM mhnx <morphinwithyou@xxxxxxxxx> wrote: > >> > > >> > ceph osd crush tree (I only have one subtree and its root default) > >> > ID CLASS WEIGHT (compat) TYPE NAME > >> > -1 2785.87891 root default > >> > -3 280.04803 280.04803 host NODE-1 > >> > 0 hdd 14.60149 14.60149 osd.0 > >> > 19 ssd 0.87320 0.87320 osd.19 > >> > 208 ssd 0.87329 0.87329 osd.208 > >> > 209 ssd 0.87329 0.87329 osd.209 > >> > -7 280.04803 280.04803 host NODE-2 > >> > 38 hdd 14.60149 14.60149 osd.38 > >> > 39 ssd 0.87320 0.87320 osd.39 > >> > 207 ssd 0.87329 0.87329 osd.207 > >> > 210 ssd 0.87329 0.87329 osd.210 > >> > -10 280.04803 280.04803 host NODE-3 > >> > 58 hdd 14.60149 14.60149 osd.58 > >> > 59 ssd 0.87320 0.87320 osd.59 > >> > 203 ssd 0.87329 0.87329 osd.203 > >> > 211 ssd 0.87329 0.87329 osd.211 > >> > -13 280.04803 280.04803 host NODE-4 > >> > 78 hdd 14.60149 14.60149 osd.78 > >> > 79 ssd 0.87320 0.87320 osd.79 > >> > 206 ssd 0.87329 0.87329 osd.206 > >> > 212 ssd 0.87329 0.87329 osd.212 > >> > -16 280.04803 280.04803 host NODE-5 > >> > 98 hdd 14.60149 14.60149 osd.98 > >> > 99 ssd 0.87320 0.87320 osd.99 > >> > 205 ssd 0.87329 0.87329 osd.205 > >> > 213 ssd 0.87329 0.87329 osd.213 > >> > -19 265.44662 265.44662 host NODE-6 > >> > 118 hdd 14.60149 14.60149 osd.118 > >> > 114 ssd 0.87329 0.87329 osd.114 > >> > 200 ssd 0.87329 0.87329 osd.200 > >> > 214 ssd 0.87329 0.87329 osd.214 > >> > -22 280.04803 280.04803 host NODE-7 > >> > 138 hdd 14.60149 14.60149 osd.138 > >> > 139 ssd 0.87320 0.87320 osd.139 > >> > 204 ssd 0.87329 0.87329 osd.204 > >> > 215 ssd 0.87329 0.87329 osd.215 > >> > -25 280.04810 280.04810 host NODE-8 > >> > 158 hdd 14.60149 14.60149 osd.158 > >> > 119 ssd 0.87329 0.87329 osd.119 > >> > 159 ssd 0.87329 0.87329 osd.159 > >> > 216 ssd 0.87329 0.87329 osd.216 > >> > -28 280.04810 280.04810 host NODE-9 > >> > 178 hdd 14.60149 14.60149 osd.178 > >> > 179 ssd 0.87329 0.87329 osd.179 > >> > 201 ssd 0.87329 0.87329 osd.201 > >> > 217 ssd 0.87329 0.87329 osd.217 > >> > -31 280.04803 280.04803 host NODE-10 > >> > 180 hdd 14.60149 14.60149 osd.180 > >> > 199 ssd 0.87320 0.87320 osd.199 > >> > 202 ssd 0.87329 0.87329 osd.202 > >> > 218 ssd 0.87329 0.87329 osd.218 > >> > > >> > This pg "6.dc" is on 199,213,217 OSD's. > >> > > >> > 6.dc 812 0 0 0 0 > 1369675264 0 0 3005 3005 > active+clean 2021-08-31 16:36:06.645208 32265'415965 32265:287175109 > [199,213,217] > >> > > >> > ceph osd df tree | grep "CLASS\|ssd" | grep ".199\|.213\|217" > >> > 199 ssd 0.87320 1.00000 894 GiB 281 GiB 119 GiB 159 GiB 2.5 GiB > 614 GiB 31.38 0.52 103 up osd.199 > >> > 213 ssd 0.87329 1.00000 894 GiB 291 GiB 95 GiB 195 GiB 2.3 GiB > 603 GiB 32.59 0.54 95 up osd.213 > >> > 217 ssd 0.87329 1.00000 894 GiB 261 GiB 83 GiB 176 GiB 2.3 GiB > 633 GiB 29.18 0.48 89 up osd.217 > >> > > >> > As you can see the pg lives on 3 ssd OSD's and one of them is the new > one. So we can not say it belongs to someone else. > >> > > >> > rule ssd-rule { > >> > id 1 > >> > type replicated > >> > step take default class ssd > >> > step chooseleaf firstn 0 type host > >> > step emit > >> > } > >> > > >> > pool 54 'rgw.buckets.index' replicated size 3 min_size 1 crush_rule 1 > object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change > 31607 lfor 0/0/30823 flags hashpspool stripe_width 0 compression_algorithm > lz4 compression_mode aggressive application rgw > >> > > >> > What is the next step? > >> > > >> > > >> > Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>, 1 Eyl 2021 Çar, 04:03 > tarihinde şunu yazdı: > >> >> > >> >> Yeah, I would suggest inspecting your CRUSH tree. Unfortunately the > >> >> grep above removed that information from 'df tree', but from the > >> >> information you provided there does appear to be a significant > >> >> imbalance remaining. > >> >> > >> >> Josh > >> >> > >> >> On Tue, Aug 31, 2021 at 6:02 PM mhnx <morphinwithyou@xxxxxxxxx> > wrote: > >> >> > > >> >> > Hello Josh! > >> >> > > >> >> > I use balancer active - crush-compat. Balance is done and there > are no remapped pgs at ceph -s > >> >> > > >> >> > ceph osd df tree | grep 'CLASS\|ssd' > >> >> > > >> >> > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP > META AVAIL %USE VAR PGS STATUS TYPE NAME > >> >> > 19 ssd 0.87320 1.00000 894 GiB 402 GiB 117 GiB 281 GiB 3.0 > GiB 492 GiB 44.93 0.74 102 up osd.19 > >> >> > 208 ssd 0.87329 1.00000 894 GiB 205 GiB 85 GiB 113 GiB 6.6 > GiB 690 GiB 22.89 0.38 95 up osd.208 > >> >> > 209 ssd 0.87329 1.00000 894 GiB 204 GiB 87 GiB 114 GiB 2.7 > GiB 690 GiB 22.84 0.38 65 up osd.209 > >> >> > 199 ssd 0.87320 1.00000 894 GiB 281 GiB 118 GiB 159 GiB 2.8 > GiB 614 GiB 31.37 0.52 103 up osd.199 > >> >> > 202 ssd 0.87329 1.00000 894 GiB 278 GiB 89 GiB 183 GiB 6.3 > GiB 616 GiB 31.08 0.51 97 up osd.202 > >> >> > 218 ssd 0.87329 1.00000 894 GiB 201 GiB 75 GiB 124 GiB 1.8 > GiB 693 GiB 22.46 0.37 84 up osd.218 > >> >> > 39 ssd 0.87320 1.00000 894 GiB 334 GiB 86 GiB 242 GiB 5.3 > GiB 560 GiB 37.34 0.61 91 up osd.39 > >> >> > 207 ssd 0.87329 1.00000 894 GiB 232 GiB 88 GiB 138 GiB 7.0 > GiB 662 GiB 25.99 0.43 81 up osd.207 > >> >> > 210 ssd 0.87329 1.00000 894 GiB 270 GiB 109 GiB 160 GiB 1.4 > GiB 624 GiB 30.18 0.50 99 up osd.210 > >> >> > 59 ssd 0.87320 1.00000 894 GiB 374 GiB 127 GiB 244 GiB 3.1 > GiB 520 GiB 41.79 0.69 97 up osd.59 > >> >> > 203 ssd 0.87329 1.00000 894 GiB 314 GiB 96 GiB 210 GiB 7.5 > GiB 581 GiB 35.06 0.58 104 up osd.203 > >> >> > 211 ssd 0.87329 1.00000 894 GiB 231 GiB 60 GiB 169 GiB 1.7 > GiB 663 GiB 25.82 0.42 81 up osd.211 > >> >> > 79 ssd 0.87320 1.00000 894 GiB 409 GiB 109 GiB 298 GiB 2.0 > GiB 486 GiB 45.70 0.75 102 up osd.79 > >> >> > 206 ssd 0.87329 1.00000 894 GiB 284 GiB 107 GiB 175 GiB 1.9 > GiB 610 GiB 31.79 0.52 94 up osd.206 > >> >> > 212 ssd 0.87329 1.00000 894 GiB 239 GiB 85 GiB 152 GiB 2.0 > GiB 655 GiB 26.71 0.44 80 up osd.212 > >> >> > 99 ssd 0.87320 1.00000 894 GiB 392 GiB 73 GiB 314 GiB 4.7 > GiB 503 GiB 43.79 0.72 85 up osd.99 > >> >> > 205 ssd 0.87329 1.00000 894 GiB 445 GiB 87 GiB 353 GiB 4.8 > GiB 449 GiB 49.80 0.82 95 up osd.205 > >> >> > 213 ssd 0.87329 1.00000 894 GiB 291 GiB 94 GiB 194 GiB 2.3 > GiB 603 GiB 32.57 0.54 95 up osd.213 > >> >> > 114 ssd 0.87329 1.00000 894 GiB 319 GiB 125 GiB 191 GiB 3.0 > GiB 575 GiB 35.67 0.59 99 up osd.114 > >> >> > 200 ssd 0.87329 1.00000 894 GiB 231 GiB 78 GiB 150 GiB 2.9 > GiB 663 GiB 25.83 0.42 90 up osd.200 > >> >> > 214 ssd 0.87329 1.00000 894 GiB 296 GiB 106 GiB 187 GiB 2.6 > GiB 598 GiB 33.09 0.54 100 up osd.214 > >> >> > 139 ssd 0.87320 1.00000 894 GiB 270 GiB 98 GiB 169 GiB 2.3 > GiB 624 GiB 30.18 0.50 96 up osd.139 > >> >> > 204 ssd 0.87329 1.00000 894 GiB 301 GiB 117 GiB 181 GiB 2.9 > GiB 593 GiB 33.64 0.55 104 up osd.204 > >> >> > 215 ssd 0.87329 1.00000 894 GiB 203 GiB 78 GiB 122 GiB 3.3 > GiB 691 GiB 22.69 0.37 81 up osd.215 > >> >> > 119 ssd 0.87329 1.00000 894 GiB 200 GiB 106 GiB 92 GiB 2.0 > GiB 694 GiB 22.39 0.37 99 up osd.119 > >> >> > 159 ssd 0.87329 1.00000 894 GiB 213 GiB 96 GiB 113 GiB 3.2 > GiB 682 GiB 23.77 0.39 93 up osd.159 > >> >> > 216 ssd 0.87329 1.00000 894 GiB 322 GiB 109 GiB 211 GiB 1.8 > GiB 573 GiB 35.96 0.59 101 up osd.216 > >> >> > 179 ssd 0.87329 1.00000 894 GiB 389 GiB 85 GiB 300 GiB 3.2 > GiB 505 GiB 43.49 0.71 104 up osd.179 > >> >> > 201 ssd 0.87329 1.00000 894 GiB 494 GiB 104 GiB 386 GiB 4.1 > GiB 401 GiB 55.20 0.91 103 up osd.201 > >> >> > 217 ssd 0.87329 1.00000 894 GiB 261 GiB 83 GiB 176 GiB 2.3 > GiB 634 GiB 29.15 0.48 89 up osd.217 > >> >> > > >> >> > > >> >> > When I check the balancer status I saw that: ""optimize_result": > "Some osds belong to multiple subtrees:" > >> >> > Do I need to check crushmap? > >> >> > > >> >> > > >> >> > > >> >> > Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>, 31 Ağu 2021 Sal, 22:32 > tarihinde şunu yazdı: > >> >> >> > >> >> >> Hi there, > >> >> >> > >> >> >> Could you post the output of "ceph osd df tree"? I would highly > >> >> >> suspect that this is a result of imbalance, and that's the > easiest way > >> >> >> to see if that's the case. It would also confirm that the new > disks > >> >> >> have taken on PGs. > >> >> >> > >> >> >> Josh > >> >> >> > >> >> >> On Tue, Aug 31, 2021 at 10:50 AM mhnx <morphinwithyou@xxxxxxxxx> > wrote: > >> >> >> > > >> >> >> > I'm using Nautilus 14.2.16 > >> >> >> > > >> >> >> > I was have 20 ssd OSD in my cluster and I added 10 more. " Each > SSD=960GB" > >> >> >> > The Size increased to *(26TiB)* as expected but the Replicated > (3) Pool Max > >> >> >> > Avail didn't changed *(3.5TiB)*. > >> >> >> > I've increased pg_num and PG rebalance is also done. > >> >> >> > > >> >> >> > Do I need any special treatment to expand the pool Max Avail? > >> >> >> > > >> >> >> > CLASS SIZE AVAIL USED RAW USED %RAW > USED > >> >> >> > hdd 2.7 PiB 1.0 PiB 1.6 PiB 1.6 PiB > 61.12 > >> >> >> > ssd *26 TiB* 18 TiB 2.8 TiB 8.7 TiB > 33.11 > >> >> >> > TOTAL 2.7 PiB 1.1 PiB 1.6 PiB 1.7 PiB > 60.85 > >> >> >> > > >> >> >> > POOLS: > >> >> >> > POOL ID PGS STORED > OBJECTS > >> >> >> > USED %USED MAX AVAIL > >> >> >> > xxx.rgw.buckets.index 54 128 541 GiB > 435.69k 541 > >> >> >> > GiB 4.82 *3.5 TiB* > >> >> >> > _______________________________________________ > >> >> >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx