Re: How to force PG merging in one step?

Frank Schilder <frans@xxxxxx> · Tue, 11 Oct 2022 22:39:22 +0000

Hi Eugen,

thanks, that was a great hint! I have a strong déjà vu feeling, we discussed this before with increasing pg_num, didn't we? I just set it to 1 and it did exactly what I wanted. Its the same number of PGs backfilling, but pgp_num=1024, so while the rebalancing load is the same, I got rid of any redundant data movements and I can actually see the progress of the merge just with ceph status.

Related to that, I have set mon_max_pg_per_osd=300 and do have OSDs with more than 400 PGs. Still, I don't see the promised health warning in ceph status. Is this a known issue?

Opinion part.

Returning to the above setting, I have to say that the assignment of which parameter influences what seems a bit unintuitive if not inconsistent. The parameter target_max_misplaced_ratio belongs to the balancer module, but merging PGs clearly is a task of the pg_autoscaler module. I'm not balancing, I'm scaling PG numbers. Such cross dependencies make it really hard to find relevant information in the section of the documentation where one would be looking for it. It starts being distributed all over the place.

If its not possible to have such things separated and specific tasks consistently explained in a single section, there could at least be a hint including also the case of PG merging/splitting in the description of target_max_misplaced_ratio so that a search for these terms brings up this page. There should also be a cross reference from "ceph osd pool set pg[p]_num" to target_max_misplaced_ratio. Well, its now here in this message for google to reveal.

I have to add that, while I understand the motivation behind adding these baby sitting modules, I would actually appreciate if one could disable them. I personally find them to be really annoying especially in emergency situations, but also in normal operations. I would consider them a nice to have and not enforce it on people who want to be in charge.

For example, in my current situation, I'm halving the PG count of a pool. Doing the merge in one go or letting the target_max_misplaced_ratio "help" me leads to exactly the same number of PGs backfilling at any time. Which means both cases, target_max_misplaced_ratio=0.05 and 1 lead to exactly the same interference of rebalancing IO with user IO. The difference is that with target_max_misplaced_ratio=0.05 this phase of reduced performance will take longer, because every time the module decides to change pgp_num it will inevitably also rebalance objects again that have been moved before. I find it difficult to consider this an improvement. I prefer to avoid any redundant writes at all cost for the benefit of disk life time. If I really need to reduce the impact of recovery IO I can set recovery_sleep.

My personal opinion to the user group.

Thanks for your help and have a nice evening!

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 11 October 2022 14:13:45
To: ceph-users@xxxxxxx
Subject:  Re: How to force PG merging in one step?

Hi Frank,

I don't think it's the autoscaler interferring here but the default 5%
target_max_misplaced_ratio. I haven't tested the impacts of increasing
that to a much higher value, so be careful.

Regards,
Eugen

Zitat von Frank Schilder <frans@xxxxxx>:

> Hi all,
>
> I need to reduce the number of PGs in a pool from 2048 to 512 and
> would really like to do that in a single step. I executed the set
> pg_num 512 command, but the PGs are not all merged. Instead I get
> this intermediate state:
>
> pool 13 'con-fs2-meta2' replicated size 4 min_size 2 crush_rule 3
> object_hash rjenkins pg_num 2048 pgp_num 1946 pg_num_target 512
> pgp_num_target 512 autoscale_mode off last_change 916710 lfor
> 0/0/618995 flags hashpspool,nodelete,selfmanaged_snaps max_bytes
> 107374182400 stripe_width 0 compression_mode none application cephfs
>
> This is really annoying, because it will not only lead to repeated
> redundant data movements and but I also need to rebalance this pool
> onto fewer OSDs, which cannot hold the 1946 PGs it will be merged to
> intermittently. How can I override the autoscaler interfering with
> admin operations in such tight corners?
>
> As you can see, we disabled autoscaler on all pools and also
> globally. Still, it interferes with admin commands in an unsolicited
> way. I would like the PG merge happen on the fly as the data moves
> to the new OSDs.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx