Re: After adding New Osd's, Pool Max Avail did not changed.

mhnx <morphinwithyou@xxxxxxxxx> · Wed, 1 Sep 2021 18:39:17 +0300

I've created a new pool "testpool" and Max avail is equal. I think you're
right. Somehow 3.5TiB is very lucky number because it was the same before
adding new osds.

    POOL                        ID     PGS      STORED      OBJECTS
 USED        %USED     MAX AVAIL
    prod.rgw.buckets.index      54      128     622 GiB     435.60k     622
GiB      5.52       3.5 TiB
    testpool                    60        4         0 B           0
 0 B         0       3.5 TiB

When I add 10 more OSDs, I've increased PG count to double. Thats why PG
counts are same but size is very differ.
So not only PG's have moved around, also objects are should balance but
crush-compat didn't do that well.
Upmap is good when it comes PG balance, PG counts will be equal but the
objects in PGs are differ and thats will make imbalance again. Upmap wasn't
good at that either if I remember correctly. But it worth to give a try.

In this weekend;
1- I will run Compaction on every SSD OSD's.
2- Deep-scrub to get rid of Large_OMAPS
3- Switch Balancer from Crush-compat to Upmap
4- If its not balanced as I desired I will go manual.

Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>, 1 Eyl 2021 Çar, 17:35 tarihinde
şunu yazdı:

> Well, if I read your df output correctly, your emptiest SSD is 22%
> full and the fullest is 55%. That's a huge spread, and max-avail takes
> imbalances like this into account. Assuming triple-replication, a
> max-avail of 3.5 TiB makes sense given this imbalance.
>
> Similarly, OSD 209 has 65 PGs on it, whereas some OSDs have 104 PGs on
> them. At least within a host, the upmap balancer should be capable of
> bringing the PG count within a variance of 1 PG or so (if configured
> to do so). I admit that I don't have experience with class-based
> rules, though, so it's possible that the upmap balancer also has
> limitations when it comes to this sort of rule.
>
> Having said that, PG count imbalance isn't the only source of
> imbalance here; OSD 19 is 45% full with 102 PGs whereas OSD 208 is 23%
> with 95 PGs; it would seem that there is a data imbalance on a per-PG
> basis, or perhaps that's just an OSD needing compaction to clean up
> some space or something like that. Not sure.
>
> Josh
>
> On Wed, Sep 1, 2021 at 8:19 AM mhnx <morphinwithyou@xxxxxxxxx> wrote:
> >
> > I've tried upmap balancer but pg distribution was not good.  I think
> Crush-compat is way better with Nautilus and also its the default option.
> As you can see at df tree pg distribution is not that bad. Also max avail
> never changed.
> >
> > If its not the reason for max avail then what it is?
> >
> > Maybe I should create a new pool and check its max avail. If the New
> pool max avail greater then other pools then ceph recalculation is bugy or
> need to triger somehow.
> >
> > 1 Eyl 2021 Çar 17:07 tarihinde Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
> şunu yazdı:
> >>
> >> Googling for that balancer error message, I came across
> >> https://tracker.ceph.com/issues/22814, which was closed/wont-fix, and
> >> some threads that claimed that class-based crush rules actually use
> >> some form of shadow trees in the background. I'm not sure how accurate
> >> that is.
> >>
> >> The only suggestion I have, which is what was also suggested in one of
> >> the above threads, is to use the upmap balancer instead if possible.
> >>
> >> Josh
> >>
> >> On Wed, Sep 1, 2021 at 2:38 AM mhnx <morphinwithyou@xxxxxxxxx> wrote:
> >> >
> >> > ceph osd crush tree (I only have one subtree and its root default)
> >> > ID  CLASS WEIGHT     (compat)  TYPE NAME
> >> >  -1       2785.87891           root default
> >> >  -3        280.04803 280.04803     host NODE-1
> >> >   0   hdd   14.60149  14.60149         osd.0
> >> >  19   ssd    0.87320   0.87320         osd.19
> >> > 208   ssd    0.87329   0.87329         osd.208
> >> > 209   ssd    0.87329   0.87329         osd.209
> >> >  -7        280.04803 280.04803     host NODE-2
> >> >  38   hdd   14.60149  14.60149         osd.38
> >> >  39   ssd    0.87320   0.87320         osd.39
> >> > 207   ssd    0.87329   0.87329         osd.207
> >> > 210   ssd    0.87329   0.87329         osd.210
> >> > -10        280.04803 280.04803     host NODE-3
> >> >  58   hdd   14.60149  14.60149         osd.58
> >> >  59   ssd    0.87320   0.87320         osd.59
> >> > 203   ssd    0.87329   0.87329         osd.203
> >> > 211   ssd    0.87329   0.87329         osd.211
> >> > -13        280.04803 280.04803     host NODE-4
> >> >  78   hdd   14.60149  14.60149         osd.78
> >> >  79   ssd    0.87320   0.87320         osd.79
> >> > 206   ssd    0.87329   0.87329         osd.206
> >> > 212   ssd    0.87329   0.87329         osd.212
> >> > -16        280.04803 280.04803     host NODE-5
> >> >  98   hdd   14.60149  14.60149         osd.98
> >> >  99   ssd    0.87320   0.87320         osd.99
> >> > 205   ssd    0.87329   0.87329         osd.205
> >> > 213   ssd    0.87329   0.87329         osd.213
> >> > -19        265.44662 265.44662     host NODE-6
> >> > 118   hdd   14.60149  14.60149         osd.118
> >> > 114   ssd    0.87329   0.87329         osd.114
> >> > 200   ssd    0.87329   0.87329         osd.200
> >> > 214   ssd    0.87329   0.87329         osd.214
> >> > -22        280.04803 280.04803     host NODE-7
> >> > 138   hdd   14.60149  14.60149         osd.138
> >> > 139   ssd    0.87320   0.87320         osd.139
> >> > 204   ssd    0.87329   0.87329         osd.204
> >> > 215   ssd    0.87329   0.87329         osd.215
> >> > -25        280.04810 280.04810     host NODE-8
> >> > 158   hdd   14.60149  14.60149         osd.158
> >> > 119   ssd    0.87329   0.87329         osd.119
> >> > 159   ssd    0.87329   0.87329         osd.159
> >> > 216   ssd    0.87329   0.87329         osd.216
> >> > -28        280.04810 280.04810     host NODE-9
> >> > 178   hdd   14.60149  14.60149         osd.178
> >> > 179   ssd    0.87329   0.87329         osd.179
> >> > 201   ssd    0.87329   0.87329         osd.201
> >> > 217   ssd    0.87329   0.87329         osd.217
> >> > -31        280.04803 280.04803     host NODE-10
> >> > 180   hdd   14.60149  14.60149         osd.180
> >> > 199   ssd    0.87320   0.87320         osd.199
> >> > 202   ssd    0.87329   0.87329         osd.202
> >> > 218   ssd    0.87329   0.87329         osd.218
> >> >
> >> > This pg "6.dc" is on 199,213,217 OSD's.
> >> >
> >> > 6.dc        812                  0        0         0       0
>  1369675264           0          0 3005     3005
> active+clean 2021-08-31 16:36:06.645208    32265'415965  32265:287175109
>                     [199,213,217]
> >> >
> >> > ceph osd df tree | grep "CLASS\|ssd" | grep ".199\|.213\|217"
> >> > 199   ssd    0.87320  1.00000 894 GiB 281 GiB 119 GiB 159 GiB 2.5 GiB
> 614 GiB 31.38 0.52 103     up         osd.199
> >> > 213   ssd    0.87329  1.00000 894 GiB 291 GiB  95 GiB 195 GiB 2.3 GiB
> 603 GiB 32.59 0.54  95     up         osd.213
> >> > 217   ssd    0.87329  1.00000 894 GiB 261 GiB  83 GiB 176 GiB 2.3 GiB
> 633 GiB 29.18 0.48  89     up         osd.217
> >> >
> >> > As you can see the pg lives on 3 ssd OSD's and one of them is the new
> one. So we can not say it belongs to someone else.
> >> >
> >> > rule ssd-rule {
> >> > id 1
> >> > type replicated
> >> > step take default class ssd
> >> > step chooseleaf firstn 0 type host
> >> > step emit
> >> > }
> >> >
> >> > pool 54 'rgw.buckets.index' replicated size 3 min_size 1 crush_rule 1
> object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change
> 31607 lfor 0/0/30823 flags hashpspool stripe_width 0 compression_algorithm
> lz4 compression_mode aggressive application rgw
> >> >
> >> > What is the next step?
> >> >
> >> >
> >> > Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>, 1 Eyl 2021 Çar, 04:03
> tarihinde şunu yazdı:
> >> >>
> >> >> Yeah, I would suggest inspecting your CRUSH tree. Unfortunately the
> >> >> grep above removed that information from 'df tree', but from the
> >> >> information you provided there does appear to be a significant
> >> >> imbalance remaining.
> >> >>
> >> >> Josh
> >> >>
> >> >> On Tue, Aug 31, 2021 at 6:02 PM mhnx <morphinwithyou@xxxxxxxxx>
> wrote:
> >> >> >
> >> >> > Hello Josh!
> >> >> >
> >> >> > I use balancer active - crush-compat. Balance is done and there
> are no remapped pgs at ceph -s
> >> >> >
> >> >> > ceph osd df tree | grep 'CLASS\|ssd'
> >> >> >
> >> >> > ID  CLASS WEIGHT     REWEIGHT SIZE    RAW USE DATA    OMAP
> META    AVAIL   %USE  VAR  PGS STATUS TYPE NAME
> >> >> >  19   ssd    0.87320  1.00000 894 GiB 402 GiB 117 GiB 281 GiB 3.0
> GiB 492 GiB 44.93 0.74 102     up         osd.19
> >> >> > 208   ssd    0.87329  1.00000 894 GiB 205 GiB  85 GiB 113 GiB 6.6
> GiB 690 GiB 22.89 0.38  95     up         osd.208
> >> >> > 209   ssd    0.87329  1.00000 894 GiB 204 GiB  87 GiB 114 GiB 2.7
> GiB 690 GiB 22.84 0.38  65     up         osd.209
> >> >> > 199   ssd    0.87320  1.00000 894 GiB 281 GiB 118 GiB 159 GiB 2.8
> GiB 614 GiB 31.37 0.52 103     up         osd.199
> >> >> > 202   ssd    0.87329  1.00000 894 GiB 278 GiB  89 GiB 183 GiB 6.3
> GiB 616 GiB 31.08 0.51  97     up         osd.202
> >> >> > 218   ssd    0.87329  1.00000 894 GiB 201 GiB  75 GiB 124 GiB 1.8
> GiB 693 GiB 22.46 0.37  84     up         osd.218
> >> >> >  39   ssd    0.87320  1.00000 894 GiB 334 GiB  86 GiB 242 GiB 5.3
> GiB 560 GiB 37.34 0.61  91     up         osd.39
> >> >> > 207   ssd    0.87329  1.00000 894 GiB 232 GiB  88 GiB 138 GiB 7.0
> GiB 662 GiB 25.99 0.43  81     up         osd.207
> >> >> > 210   ssd    0.87329  1.00000 894 GiB 270 GiB 109 GiB 160 GiB 1.4
> GiB 624 GiB 30.18 0.50  99     up         osd.210
> >> >> >  59   ssd    0.87320  1.00000 894 GiB 374 GiB 127 GiB 244 GiB 3.1
> GiB 520 GiB 41.79 0.69  97     up         osd.59
> >> >> > 203   ssd    0.87329  1.00000 894 GiB 314 GiB  96 GiB 210 GiB 7.5
> GiB 581 GiB 35.06 0.58 104     up         osd.203
> >> >> > 211   ssd    0.87329  1.00000 894 GiB 231 GiB  60 GiB 169 GiB 1.7
> GiB 663 GiB 25.82 0.42  81     up         osd.211
> >> >> >  79   ssd    0.87320  1.00000 894 GiB 409 GiB 109 GiB 298 GiB 2.0
> GiB 486 GiB 45.70 0.75 102     up         osd.79
> >> >> > 206   ssd    0.87329  1.00000 894 GiB 284 GiB 107 GiB 175 GiB 1.9
> GiB 610 GiB 31.79 0.52  94     up         osd.206
> >> >> > 212   ssd    0.87329  1.00000 894 GiB 239 GiB  85 GiB 152 GiB 2.0
> GiB 655 GiB 26.71 0.44  80     up         osd.212
> >> >> >  99   ssd    0.87320  1.00000 894 GiB 392 GiB  73 GiB 314 GiB 4.7
> GiB 503 GiB 43.79 0.72  85     up         osd.99
> >> >> > 205   ssd    0.87329  1.00000 894 GiB 445 GiB  87 GiB 353 GiB 4.8
> GiB 449 GiB 49.80 0.82  95     up         osd.205
> >> >> > 213   ssd    0.87329  1.00000 894 GiB 291 GiB  94 GiB 194 GiB 2.3
> GiB 603 GiB 32.57 0.54  95     up         osd.213
> >> >> > 114   ssd    0.87329  1.00000 894 GiB 319 GiB 125 GiB 191 GiB 3.0
> GiB 575 GiB 35.67 0.59  99     up         osd.114
> >> >> > 200   ssd    0.87329  1.00000 894 GiB 231 GiB  78 GiB 150 GiB 2.9
> GiB 663 GiB 25.83 0.42  90     up         osd.200
> >> >> > 214   ssd    0.87329  1.00000 894 GiB 296 GiB 106 GiB 187 GiB 2.6
> GiB 598 GiB 33.09 0.54 100     up         osd.214
> >> >> > 139   ssd    0.87320  1.00000 894 GiB 270 GiB  98 GiB 169 GiB 2.3
> GiB 624 GiB 30.18 0.50  96     up         osd.139
> >> >> > 204   ssd    0.87329  1.00000 894 GiB 301 GiB 117 GiB 181 GiB 2.9
> GiB 593 GiB 33.64 0.55 104     up         osd.204
> >> >> > 215   ssd    0.87329  1.00000 894 GiB 203 GiB  78 GiB 122 GiB 3.3
> GiB 691 GiB 22.69 0.37  81     up         osd.215
> >> >> > 119   ssd    0.87329  1.00000 894 GiB 200 GiB 106 GiB  92 GiB 2.0
> GiB 694 GiB 22.39 0.37  99     up         osd.119
> >> >> > 159   ssd    0.87329  1.00000 894 GiB 213 GiB  96 GiB 113 GiB 3.2
> GiB 682 GiB 23.77 0.39  93     up         osd.159
> >> >> > 216   ssd    0.87329  1.00000 894 GiB 322 GiB 109 GiB 211 GiB 1.8
> GiB 573 GiB 35.96 0.59 101     up         osd.216
> >> >> > 179   ssd    0.87329  1.00000 894 GiB 389 GiB  85 GiB 300 GiB 3.2
> GiB 505 GiB 43.49 0.71 104     up         osd.179
> >> >> > 201   ssd    0.87329  1.00000 894 GiB 494 GiB 104 GiB 386 GiB 4.1
> GiB 401 GiB 55.20 0.91 103     up         osd.201
> >> >> > 217   ssd    0.87329  1.00000 894 GiB 261 GiB  83 GiB 176 GiB 2.3
> GiB 634 GiB 29.15 0.48  89     up         osd.217
> >> >> >
> >> >> >
> >> >> > When I check the balancer status I saw that: ""optimize_result":
> "Some osds belong to multiple subtrees:"
> >> >> > Do I need to check crushmap?
> >> >> >
> >> >> >
> >> >> >
> >> >> > Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>, 31 Ağu 2021 Sal, 22:32
> tarihinde şunu yazdı:
> >> >> >>
> >> >> >> Hi there,
> >> >> >>
> >> >> >> Could you post the output of "ceph osd df tree"? I would highly
> >> >> >> suspect that this is a result of imbalance, and that's the
> easiest way
> >> >> >> to see if that's the case. It would also confirm that the new
> disks
> >> >> >> have taken on PGs.
> >> >> >>
> >> >> >> Josh
> >> >> >>
> >> >> >> On Tue, Aug 31, 2021 at 10:50 AM mhnx <morphinwithyou@xxxxxxxxx>
> wrote:
> >> >> >> >
> >> >> >> > I'm using Nautilus 14.2.16
> >> >> >> >
> >> >> >> > I was have 20 ssd OSD in my cluster and I added 10 more. " Each
> SSD=960GB"
> >> >> >> > The Size increased to *(26TiB)* as expected but the Replicated
> (3) Pool Max
> >> >> >> > Avail didn't changed *(3.5TiB)*.
> >> >> >> > I've increased pg_num and PG rebalance is also done.
> >> >> >> >
> >> >> >> > Do I need any special treatment to expand the pool Max Avail?
> >> >> >> >
> >> >> >> > CLASS     SIZE        AVAIL       USED        RAW USED     %RAW
> USED
> >> >> >> >     hdd       2.7 PiB     1.0 PiB     1.6 PiB      1.6 PiB
>    61.12
> >> >> >> >     ssd        *26 TiB*      18 TiB     2.8 TiB      8.7 TiB
>      33.11
> >> >> >> >     TOTAL     2.7 PiB     1.1 PiB     1.6 PiB      1.7 PiB
>    60.85
> >> >> >> >
> >> >> >> > POOLS:
> >> >> >> >     POOL                        ID     PGS      STORED
> OBJECTS
> >> >> >> >  USED        %USED     MAX AVAIL
> >> >> >> >     xxx.rgw.buckets.index      54      128     541 GiB
>  435.69k     541
> >> >> >> > GiB      4.82       *3.5 TiB*
> >> >> >> > _______________________________________________
> >> >> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx