Balancer blocked as autoscaler not acting on scaling change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Hi Folks,

We are currently running with one nearfull OSD and 15 nearfull pools. The most full OSD is about 86% full but the average is 58% full. However, the balancer is skipping a pool on which the autoscaler is trying to complete a pg_num reduction from 131,072 to 32,768 ( pool). However, the autoscaler has been working on this for the last 20 days, it works through a list of objects that are misplaced but when it gets close to the end, more objects get added to the list.

This morning I observed the list get down to c. 7,000 objects misplaced with 2 PGs active+remapped+backfilling, one PG completed the backfilling then the list shot up to c. 70,000 objects misplaced with 3 PGs active+remapped+backfilling.

Has anyone come across this behaviour before? If so, what was your remediation?

Thanks in advance for sharing.

Cluster details:
3,068 OSDs when all running, c. 60 per storage node
OS: Ubuntu 20.04
Ceph: Pacific 16.2.13 from Ubuntu Cloud Archive

Use case:
S3 storage and OpenStack backend, all pools three-way replicated
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux