As of Nautilus+, when you set pg_num, it actually internally sets pg(p)_num_target, and then slowly increases (or decreases, if you're merging) pg_num and then pgp_num until it reaches the target. The amount of backfill scheduled into the system is controlled by target_max_misplaced_ratio. Josh On Fri, Oct 7, 2022 at 3:50 AM Nicola Mori <mori@xxxxxxxxxx> wrote: > > The situation got solved by itself, since probably there was no error. I > manually increased the number of PGs and PGPs to 128 some days ago, and > the PGP count was being updated step by step. Actually after a bump from > 5% to 7% in the count of misplaced object I noticed that the number of > PGPs was updated to 126, and after a last bump it is now at 128 with a > ~4% of misplaced objects currently decreasing. > Sorry for the noise, > > Nicola > > On 07/10/22 09:15, Nicola Mori wrote: > > Dear Ceph users, > > > > my cluster is stuck since several days with some PG backfilling. The > > number of misplaced objects slowly decreases down to 5%, and at that > > point jumps up again to about 7%, and so on. I found several possible > > reasons for this behavior. One is related to the balancer, which anyway > > I think is not operating: > > > > # ceph balancer status > > { > > "active": false, > > "last_optimize_duration": "0:00:00.000938", > > "last_optimize_started": "Thu Oct 6 16:19:59 2022", > > "mode": "upmap", > > "optimize_result": "Too many objects (0.071539 > 0.050000) are > > misplaced; try again later", > > "plans": [] > > } > > > > (the lase optimize result is from yesterday when I disabled it, and > > since then the backfill loop has happened several times). > > Another possible reason seems to be an imbalance of PG and PGB numbers. > > Effectively I found such an imbalance on one of my pools: > > > > # ceph osd pool get wizard_data pg_num > > pg_num: 128 > > # ceph osd pool get wizard_data pgp_num > > pgp_num: 123 > > > > but I cannot fix it: > > # ceph osd pool set wizard_data pgp_num 128 > > set pool 3 pgp_num to 128 > > # ceph osd pool get wizard_data pgp_num > > pgp_num: 123 > > > > The autoscaler is off for that pool: > > > > POOL SIZE TARGET SIZE RATE RAW CAPACITY > > RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM > > AUTOSCALE BULK > > wizard_data 8951G 1.3333333730697632 152.8T > > 0.0763 1.0 128 off > > False > > > > so I don't understand why the PGP number is stuck at 123. > > Thanks in advance for any help, > > > > Nicola > > -- > Nicola Mori, Ph.D. > INFN sezione di Firenze > Via Bruno Rossi 1, 50019 Sesto F.no (Italy) > +390554572660 > mori@xxxxxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx