Re: Iinfinite backfill loop + number of pgp groups stuck at wrong value

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The situation got solved by itself, since probably there was no error. I manually increased the number of PGs and PGPs to 128 some days ago, and the PGP count was being updated step by step. Actually after a bump from 5% to 7% in the count of misplaced object I noticed that the number of PGPs was updated to 126, and after a last bump it is now at 128 with a ~4% of misplaced objects currently decreasing.
Sorry for the noise,

Nicola

On 07/10/22 09:15, Nicola Mori wrote:
Dear Ceph users,

my cluster is stuck since several days with some PG backfilling. The number of misplaced objects slowly decreases down to 5%, and at that point jumps up again to about 7%, and so on. I found several possible reasons for this behavior. One is related to the balancer, which anyway I think is not operating:

# ceph balancer status
{
     "active": false,
     "last_optimize_duration": "0:00:00.000938",
     "last_optimize_started": "Thu Oct  6 16:19:59 2022",
     "mode": "upmap",
    "optimize_result": "Too many objects (0.071539 > 0.050000) are misplaced; try again later",
     "plans": []
}

(the lase optimize result is from yesterday when I disabled it, and since then the backfill loop has happened several times). Another possible reason seems to be an imbalance of PG and PGB  numbers. Effectively I found such an imbalance on one of my pools:

# ceph osd pool get wizard_data pg_num
pg_num: 128
# ceph osd pool get wizard_data pgp_num
pgp_num: 123

but I cannot fix it:
# ceph osd pool set wizard_data pgp_num 128
set pool 3 pgp_num to 128
# ceph osd pool get wizard_data pgp_num
pgp_num: 123

The autoscaler is off for that pool:

POOL               SIZE  TARGET SIZE                RATE  RAW CAPACITY RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM AUTOSCALE  BULK wizard_data       8951G               1.3333333730697632        152.8T 0.0763                                  1.0     128              off False

so I don't understand why the PGP number is stuck at 123.
Thanks in advance for any help,

Nicola

--
Nicola Mori, Ph.D.
INFN sezione di Firenze
Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
+390554572660
mori@xxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux