Re: Balancer blocked as autoscaler not acting on scaling change

Eugen Block <eblock@xxxxxx> · Wed, 04 Oct 2023 07:40:01 +0000

Hi,

you could change the target_max_misplaced_ratio to 1, the balancer has  
a default 5% ratio of misplaced objects, see [1] for more information:

ceph config get mgr target_max_misplaced_ratio
0.050000

[1] https://docs.ceph.com/en/latest/rados/operations/balancer/#throttling

Zitat von bc10@xxxxxxxxxxxx:

Hi Folks,

We are currently running with one nearfull OSD and 15 nearfull  
pools. The most full OSD is about 86% full but the average is 58%  
full. However, the balancer is skipping a pool on which the  
autoscaler is trying to complete a pg_num reduction from 131,072 to  
32,768 (default.rgw.buckets.data pool). However, the autoscaler has  
been working on this for the last 20 days, it works through a list  
of objects that are misplaced but when it gets close to the end,  
more objects get added to the list.

This morning I observed the list get down to c. 7,000 objects  
misplaced with 2 PGs active+remapped+backfilling, one PG completed  
the backfilling then the list shot up to c. 70,000 objects misplaced  
with 3 PGs active+remapped+backfilling.

Has anyone come across this behaviour before? If so, what was your  
remediation?

Thanks in advance for sharing.
Bruno

Cluster details:
3,068 OSDs when all running, c. 60 per storage node
OS: Ubuntu 20.04
Ceph: Pacific 16.2.13 from Ubuntu Cloud Archive

Use case:
S3 storage and OpenStack backend, all pools three-way replicated
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx