Re: Iinfinite backfill loop + number of pgp groups stuck at wrong value

Frank Schilder <frans@xxxxxx> · Wed, 12 Oct 2022 13:53:44 +0000

Hi Nicola,

its not noise. Even though the modules seem disabled and pool flags are set to false, they still linger around in the background and interfere. See the recent thread https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/WST6K5A4UQGGISBFGJEZS4HFL2VVWW32/ .

With all the settings you have, the last one would be setting

ceph config set mgr target_max_misplaced_ratio 1

and all the balancer- and scaling modules will just do what you tell them, assuming you know what you are doing. I restored default behaviour with instant application of changes and don't have any problems with it.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
Sent: 07 October 2022 17:16:49
To: Nicola Mori
Cc: ceph-users
Subject:  Re: Iinfinite backfill loop + number of pgp groups stuck at wrong value

As of Nautilus+, when you set pg_num, it actually internally sets
pg(p)_num_target, and then slowly increases (or decreases, if you're
merging) pg_num and then pgp_num until it reaches the target. The
amount of backfill scheduled into the system is controlled by
target_max_misplaced_ratio.

Josh

On Fri, Oct 7, 2022 at 3:50 AM Nicola Mori <mori@xxxxxxxxxx> wrote:
>
> The situation got solved by itself, since probably there was no error. I
> manually increased the number of PGs and PGPs to 128 some days ago, and
> the PGP count was being updated step by step. Actually after a bump from
> 5% to 7% in the count of misplaced object I noticed that the number of
> PGPs was updated to 126, and after a last bump it is now at 128 with a
> ~4% of misplaced objects currently decreasing.
> Sorry for the noise,
>
> Nicola
>
> On 07/10/22 09:15, Nicola Mori wrote:
> > Dear Ceph users,
> >
> > my cluster is stuck since several days with some PG backfilling. The
> > number of misplaced objects slowly decreases down to 5%, and at that
> > point jumps up again to about 7%, and so on. I found several possible
> > reasons for this behavior. One is related to the balancer, which anyway
> > I think is not operating:
> >
> > # ceph balancer status
> > {
> >      "active": false,
> >      "last_optimize_duration": "0:00:00.000938",
> >      "last_optimize_started": "Thu Oct  6 16:19:59 2022",
> >      "mode": "upmap",
> >      "optimize_result": "Too many objects (0.071539 > 0.050000) are
> > misplaced; try again later",
> >      "plans": []
> > }
> >
> > (the lase optimize result is from yesterday when I disabled it, and
> > since then the backfill loop has happened several times).
> > Another possible reason seems to be an imbalance of PG and PGB  numbers.
> > Effectively I found such an imbalance on one of my pools:
> >
> > # ceph osd pool get wizard_data pg_num
> > pg_num: 128
> > # ceph osd pool get wizard_data pgp_num
> > pgp_num: 123
> >
> > but I cannot fix it:
> > # ceph osd pool set wizard_data pgp_num 128
> > set pool 3 pgp_num to 128
> > # ceph osd pool get wizard_data pgp_num
> > pgp_num: 123
> >
> > The autoscaler is off for that pool:
> >
> > POOL               SIZE  TARGET SIZE                RATE  RAW CAPACITY
> > RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM
> > AUTOSCALE  BULK
> > wizard_data       8951G               1.3333333730697632        152.8T
> > 0.0763                                  1.0     128              off
> > False
> >
> > so I don't understand why the PGP number is stuck at 123.
> > Thanks in advance for any help,
> >
> > Nicola
>
> --
> Nicola Mori, Ph.D.
> INFN sezione di Firenze
> Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
> +390554572660
> mori@xxxxxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx