Re: objects misplaced jumps up at 5%

Matt Larson <larsonmattr@xxxxxxxxx> · Tue, 29 Sep 2020 11:52:59 -0500

Continuing on this topic, is it only possible to increase the count of
placement group (PG) quickly, but the associated placement group
placeholder (PGP) values can only increase in smaller increments of
1-3? Each increase of the PGP requires a rebalancing and backfill
again of lots of PGs?

I am working with an erasure coded pool that I recently increased a
target PG count by use of the pg_autoscaler  `target_size_ratio` to be
closer to what I expect the pool's data size to grow to. I am
wondering if this pool will constantly hit 5% misplaced, incrementally
add PGP count, and repeat backfilling PGs while the PGs sit
unscrubbed.

I have 160 OSDs in my pool and now have a target PG count of 2048.

I've seen a similar 5% misplaced and never-ending backfills described
in a thread by Paul Mezannini
(https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/OUEZJAEKFV74H5RCZSICNXS3P5JHYRK6/).

I'd be nice to know the correct strategy to adjust the overall PG
count and have a pool return to a healthy + balanced state.

-Matt

On Tue, Sep 29, 2020 at 9:10 AM Paul Emmerich <emmerich@xxxxxxxxxx> wrote:
>
> On Tue, Sep 29, 2020 at 12:34 PM Jake Grimmett <jog@xxxxxxxxxxxxxxxxx>
> wrote:
>
> > I think you found the answer!
> >
> > When adding 100 new OSDs to the cluster, I increased both pg and pgp
> > from 4096 to 16,384
> >
>
> Too much for your cluster, 4096 seems sufficient for a pool of size 10.
> You can still reduce it relatively cheaply while it hasn't been fully
> actuated yet
>
>
> Paul
>
>
> >
> > **********************************
> > [root@ceph1 ~]# ceph osd pool set ec82pool pg_num 16384
> > set pool 5 pg_num to 16384
> >
> > [root@ceph1 ~]# ceph osd pool set ec82pool pgp_num 16384
> > set pool 5 pgp_num to 16384
> >
> > **********************************
> >
> > The pg number increased immediately as seen with "ceph -s"
> >
> > But unknown to me, the pgp number did not increase immediately.
> >
> > "ceph osd pool ls detail" shows that pgp is currently 11412
> >
> > Each time we hit 5.000% misplaced, the pgp number increases by 1 or 2,
> > this causes the % misplaced to increase again to ~5.1%
> > ... which is why we thought the cluster was not re-balancing.
> >
> >
> > If I'd looked at the ceph.audit.log there are entries like this:
> >
> > 2020-09-23 01:13:11.564384 mon.ceph3b (mon.1) 50747 : audit [INF]
> > from='mgr.90414409 10.1.0.80:0/7898' entity='mgr.ceph2' cmd=[{"prefix":
> > "osd pool set", "pool": "ec82pool", "var": "pgp_num_actual", "val":
> > "5076"}]: dispatch
> > 2020-09-23 01:13:11.565598 mon.ceph1b (mon.0) 85947 : audit [INF]
> > from='mgr.90414409 ' entity='mgr.ceph2' cmd=[{"prefix": "osd pool set",
> > "pool": "ec82pool", "var": "pgp_num_actual", "val": "5076"}]: dispatch
> > 2020-09-23 01:13:12.530584 mon.ceph1b (mon.0) 85949 : audit [INF]
> > from='mgr.90414409 ' entity='mgr.ceph2' cmd='[{"prefix": "osd pool set",
> > "pool": "ec82pool", "var": "pgp_num_actual", "val": "5076"}]': finished
> >
> >
> > Our assumption is that the pgp number will continue to increase till it
> > reaches its set level, at which point the cluster will complete it's
> > re-balance...
> >
> > again, many thanks to you both for your help,
> >
> > Jake
> >
> > On 28/09/2020 17:35, Paul Emmerich wrote:
> > > Hi,
> > >
> > > 5% misplaced is the default target ratio for misplaced PGs when any
> > > automated rebalancing happens, the sources for this are either the
> > > balancer or pg scaling.
> > > So I'd suspect that there's a PG change ongoing (either pg autoscaler or
> > > a manual change, both obey the target misplaced ratio).
> > > You can check this by running "ceph osd pool ls detail" and check for
> > > the value of pg target.
> > >
> > > Also: Looks like you've set osd_scrub_during_recovery = false, this
> > > setting can be annoying on large erasure-coded setups on HDDs that see
> > > long recovery times. It's better to get IO priorities right; search
> > > mailing list for osd op queue cut off high.
> > >
> > > Paul
> >
> > --
> > Dr Jake Grimmett
> > Head Of Scientific Computing
> > MRC Laboratory of Molecular Biology
> > Francis Crick Avenue,
> > Cambridge CB2 0QH, UK.
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Matt Larson, PhD
Madison, WI  53705 U.S.A.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx