Hi, I’ve just read a post that describe the exact behavior you describe. https://ceph.io/rados/new-in-nautilus-pg-merging-and-autotuning/ There is a config option named target_max_misplaced_ratio, which defaults to 5%. You can change this to accelerate the remap process. Hopes that’s helpful. Sent from my iPad On Sep 29, 2020, at 18:34, Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote: Hi Paul, I think you found the answer! When adding 100 new OSDs to the cluster, I increased both pg and pgp from 4096 to 16,384 ********************************** [root@ceph1 ~]# ceph osd pool set ec82pool pg_num 16384 set pool 5 pg_num to 16384 [root@ceph1 ~]# ceph osd pool set ec82pool pgp_num 16384 set pool 5 pgp_num to 16384 ********************************** The pg number increased immediately as seen with "ceph -s" But unknown to me, the pgp number did not increase immediately. "ceph osd pool ls detail" shows that pgp is currently 11412 Each time we hit 5.000% misplaced, the pgp number increases by 1 or 2, this causes the % misplaced to increase again to ~5.1% ... which is why we thought the cluster was not re-balancing. If I'd looked at the ceph.audit.log there are entries like this: 2020-09-23 01:13:11.564384 mon.ceph3b (mon.1) 50747 : audit [INF] from='mgr.90414409 10.1.0.80:0/7898' entity='mgr.ceph2' cmd=[{"prefix": "osd pool set", "pool": "ec82pool", "var": "pgp_num_actual", "val": "5076"}]: dispatch 2020-09-23 01:13:11.565598 mon.ceph1b (mon.0) 85947 : audit [INF] from='mgr.90414409 ' entity='mgr.ceph2' cmd=[{"prefix": "osd pool set", "pool": "ec82pool", "var": "pgp_num_actual", "val": "5076"}]: dispatch 2020-09-23 01:13:12.530584 mon.ceph1b (mon.0) 85949 : audit [INF] from='mgr.90414409 ' entity='mgr.ceph2' cmd='[{"prefix": "osd pool set", "pool": "ec82pool", "var": "pgp_num_actual", "val": "5076"}]': finished Our assumption is that the pgp number will continue to increase till it reaches its set level, at which point the cluster will complete it's re-balance... again, many thanks to you both for your help, Jake On 28/09/2020 17:35, Paul Emmerich wrote: Hi, 5% misplaced is the default target ratio for misplaced PGs when any automated rebalancing happens, the sources for this are either the balancer or pg scaling. So I'd suspect that there's a PG change ongoing (either pg autoscaler or a manual change, both obey the target misplaced ratio). You can check this by running "ceph osd pool ls detail" and check for the value of pg target. Also: Looks like you've set osd_scrub_during_recovery = false, this setting can be annoying on large erasure-coded setups on HDDs that see long recovery times. It's better to get IO priorities right; search mailing list for osd op queue cut off high. Paul -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx