Re: Many misplaced PG's, full OSD's and a good amount of manual intervention to keep my Ceph cluster alive.

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sun, 5 Jan 2025 18:24:31 -0500

> Very solid advice here - that’s the beauty of Ceph community.
> 
> Just adding to what Anthony mentioned: a reweight from 1 to 0.2 (and back) is quite extreme and the cluster won’t like it.

And these days with the balancer, pg-upmap entries to the same effect are a better idea.

> From the clients perspective Your main concern now is to keep the pools “alive” with enough space while the backfilling takes place.

To that end, you can *temporarily* give yourself a bit more margin:

ceph osd set-nearfull-ratio .85
ceph osd set-backfillfull-ratio .90
ceph osd set-full-ratio .95
Those are the default values, and Ceph (now) enforces that the values are >= (or maybe >) in that order.

So you might set the full ratio to .98, backfillfull to .96.  Nearfull is only cosmetic.

But absolutely do not forget to revert to default values once the cluster is balanced, or to other values that you make an educated decision to choose.

> Even with plenty of OSDs that are not filled you might hit a single overfilled OSD and the whole pool will stop accepting new data. 

Yep, see above.  Not immediately clear to me why that data pool is so full unless the CRUSH rule / device classes are wonky.

> Clients will start getting “No more space available” errors. That happened to us with CephFS recently with a very similar scenario where the cluster got much more data than expected in a short amount of time, not fun. 
> With the balancer not working due to too many misplaced objects that’s an increased risk so just heads up and keep that in mind. To get things working we simply balanced manually the OSDs with upmaps moving data from the most full ones to the least full ones (our builtin balancer sadly does not work).
> 
> 
> One small observation:
> I’ve noticed that 'ceph osd pool ls detail |grep cephfs.cephfs01.data’ has pg_num increased but the pgp_num is still the same.
> You will need to set it as well for data migration to new pgs to happen: https://docs.ceph.com/en/mimic/rados/operations/placement-groups/#set-the-number-of-placement-groups

The mgr usually does that for recent Ceph releases.  With older releases we had to incremental pg_num and pgp_num in lockstep, which was kind of a pain.

> 
> 
> Best,
> Laimis J.
> 
>> On 5 Jan 2025, at 16:11, Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:
>> 
>> 
>>>> What reweighs have been set for the top OSDs (ceph osd df tree)?
>>>> 
>>> Right now they are all at 1.0. I had to lower them to something close to
>>> 0.2 in order to free up space but I changed them back to 1.0. Should I
>>> lower them while the backfill is happening?
>> 
>> Old-style legacy override reweights don’t mesh well with the balancer.   Best to leave them at 1.00.  
>> 
>> 0.2 is pretty extreme, back in the day I rarely went below 0.8.   
>> 
>>>> ```
>>>> "optimize_result": "Too many objects (0.355160 > 0.050000) are misplaced;
>>>> try again late
>>>> ```
>> 
>> That should clear.  The balancer doesn’t want to stir up trouble if the cluster already has a bunch of backfill / recovery going on.  Patience!
>> 
>>>> default.rgw.buckets.data    10  1024  197 TiB  133.75M  592 TiB  93.69
>>>>   13 TiB
>>>> default.rgw.buckets.non-ec  11    32   78 MiB    1.43M   17 GiB   
>> 
>> That’s odd that the data pool is that full but the others aren’t.  
>> 
>> Please send `ceph osd crush rule dump `.  And `ceph osd dump | grep pool`
>> 
>> 
>>>> 
>>>> I also tried changing the following but it does not seem to persist:
>> 
>> Could be an mclock thing.  
>> 
>>>> 1. Why I ended up with so many misplaced PG's since there were no changes
>>>> on the cluster: number of osd's, hosts, etc.
>> 
>> Probably a result of the autoscaler splitting PGs or of some change to CRUSH rules such that some data can’t be placed.
>> 
>>>> 2. Is it ok to change the target_max_misplaced_ratio to something higher
>>>> than .05 so the autobalancer would work and I wouldn't have to constantly
>>>> rebalance the osd's manually?
>> 
>> I wouldn’t, that’s a symptom not the disease.  
>>>> Bruno
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Bruno Gomes Pessanha
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx