Re: Many misplaced PG's, full OSD's and a good amount of manual intervention to keep my Ceph cluster alive.

Bruno Gomes Pessanha <bruno.pessanha@xxxxxxxxx> · Mon, 6 Jan 2025 16:00:01 +0100

>
> So you might set the full ratio to .98, backfillfull to .96.  Nearfull is
> only cosmetic.

Thanks for the advice. It seems to be working with 0.92 for now. If it gets
stuck I'll increase it.

On Mon, 6 Jan 2025 at 00:24, Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:

>
>
> Very solid advice here - that’s the beauty of Ceph community.
>
> Just adding to what Anthony mentioned: a reweight from 1 to 0.2 (and back)
> is quite extreme and the cluster won’t like it.
>
>
> And these days with the balancer, pg-upmap entries to the same effect are
> a better idea.
>
> From the clients perspective Your main concern now is to keep the pools
> “alive” with enough space while the backfilling takes place.
>
>
> To that end, you can *temporarily* give yourself a bit more margin:
>
> ceph osd set-nearfull-ratio .85
> ceph osd set-backfillfull-ratio .90
> ceph osd set-full-ratio .95
>
> Those are the default values, and Ceph (now) enforces that the values are
> >= (or maybe >) in that order.
>
> So you might set the full ratio to .98, backfillfull to .96.  Nearfull is
> only cosmetic.
>
> But absolutely do not forget to revert to default values once the cluster
> is balanced, or to other values that you make an educated decision to
> choose.
>
> Even with plenty of OSDs that are not filled you might hit a single
> overfilled OSD and the whole pool will stop accepting new data.
>
>
> Yep, see above.  Not immediately clear to me why that data pool is so full
> unless the CRUSH rule / device classes are wonky.
>
> Clients will start getting “No more space available” errors. That happened
> to us with CephFS recently with a very similar scenario where the cluster
> got much more data than expected in a short amount of time, not fun.
> With the balancer not working due to too many misplaced objects that’s an
> increased risk so just heads up and keep that in mind. To get things
> working we simply balanced manually the OSDs with upmaps moving data from
> the most full ones to the least full ones (our builtin balancer sadly
> does not work).
>
>
> One small observation:
> I’ve noticed that 'ceph osd pool ls detail |grep cephfs.cephfs01.data’
> has pg_num increased but the pgp_num is still the same.
> You will need to set it as well for data migration to new pgs to happen:
> https://docs.ceph.com/en/mimic/rados/operations/placement-groups/#set-the-number-of-placement-groups
>
>
> The mgr usually does that for recent Ceph releases.  With older releases
> we had to incremental pg_num and pgp_num in lockstep, which was kind of a
> pain.
>
>
>
> Best,
>
> *Laimis J.*
>
> On 5 Jan 2025, at 16:11, Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:
>
>
> What reweighs have been set for the top OSDs (ceph osd df tree)?
>
> Right now they are all at 1.0. I had to lower them to something close to
> 0.2 in order to free up space but I changed them back to 1.0. Should I
> lower them while the backfill is happening?
>
>
> Old-style legacy override reweights don’t mesh well with the balancer.
>   Best to leave them at 1.00.
>
> 0.2 is pretty extreme, back in the day I rarely went below 0.8.
>
> ```
> "optimize_result": "Too many objects (0.355160 > 0.050000) are misplaced;
> try again late
> ```
>
>
> That should clear.  The balancer doesn’t want to stir up trouble if the
> cluster already has a bunch of backfill / recovery going on.  Patience!
>
> default.rgw.buckets.data    10  1024  197 TiB  133.75M  592 TiB  93.69
>   13 TiB
> default.rgw.buckets.non-ec  11    32   78 MiB    1.43M   17 GiB
>
>
> That’s odd that the data pool is that full but the others aren’t.
>
> Please send `ceph osd crush rule dump `.  And `ceph osd dump | grep pool`
>
>
>
> I also tried changing the following but it does not seem to persist:
>
>
> Could be an mclock thing.
>
> 1. Why I ended up with so many misplaced PG's since there were no changes
> on the cluster: number of osd's, hosts, etc.
>
>
> Probably a result of the autoscaler splitting PGs or of some change to
> CRUSH rules such that some data can’t be placed.
>
> 2. Is it ok to change the target_max_misplaced_ratio to something higher
> than .05 so the autobalancer would work and I wouldn't have to constantly
> rebalance the osd's manually?
>
>
> I wouldn’t, that’s a symptom not the disease.
>
> Bruno
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
>
>
> --
> Bruno Gomes Pessanha
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
>

-- 
Bruno Gomes Pessanha
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx