Re: PG scaling questions

胡玮文 <huww98@xxxxxxxxxxx> · Tue, 3 Aug 2021 13:51:22 +0000

在 2021年8月3日，21:32，Gabriel Tzagkarakis <gabrieltz@xxxxxxxxx> 写道：

hi , thank you for replying

Does this method refer to manually setting the number of placement groups while keeping autoscale_mode setting off ?
Also from what i can see from the documentation the  target_max_misplaced_ratio  implies using the balancer feature, which I am currently not using

I believe this “auto pgp_num increasing” feature works independently from autoscaler and balancer. When the last time I increase pg_num to 1024, I have autoscale mode set to warn, and balancer off. I recommend you to read this blog. https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/ Specifically, near “Starting in Nautilus, this second step is no longer necessary: …”

And target_max_misplaced_ratio is not only used in balancer, but also used in this feature.

If I understood correctly the existing PGs will be split in place and act as primary for the backfills that will be required to distribute the data evenly to all osds

Can i use the manual way to increase slowly pgp in the pool end when my PGs have a more manageable size i will enable the balancer.

will there be a considerable amount of downtime splitting pgs and peering ?

I didn’t observe any significant downtime the last time I did this. I think it is several seconds at most.

I'm sorry for asking too many questions , i'm trying not to break stuff :)

On Tue, Aug 3, 2021 at 3:46 PM 胡 玮文 <huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx>> wrote:
Hi,

Each placement group will get split in 4 pieces in-place all at nearly the same time, no empty pgs will be created.

Normally, you only set pg_num, but do not touch pgp_num. Instead, you can set “target_max_misplaced_ratio” (default 5%). Then mgr will increase pgp_num for you. It will increase pgp_num so that some pg get placed into another OSD, until misplaced ratio reached target. Then it wait for some backfilling to finish before increasing pgp_num again. (This behavior seems to be introduced in Nautilus)

So I don’t think you need to worry about full OSDs. “backfillfull ratio” should throttling backfill when OSD is nearly full, which in turn will throttling pgp_num increase.

发件人: Gabriel Tzagkarakis<mailto:gabrieltz@xxxxxxxxx>
发送时间: 2021年8月3日 19:42
收件人: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
主题:  PG scaling questions

hello everyone,

I would like to know how does the autoscale or manual scaling actually
works to prevent my
cluster from running out of disk space.

Let's say i want to scale a pool of 8 PGs each ~400Gb to 32 PGs.

1) does each placement group get split in 4 pieces IN-PLACE all at the same
time ?
2) does autoscaling choose one of the existing random placement groups for
example X.Y and
 creates new empty placement groups and migrates data upon them and then
continues to the next big PG with or without deleting the original PG?
3) something else ?

I am more concerned about the time period when both the
initial/pre-existing PGs and the newly created ones co-exist in the cluster
to prevent full osds. In my case each pg has many small files and deleting
stray pgs takes a long time.

Would it be better if i used something like
ceph osd pool set default.rgw.buckets.data pg_num 32
and then increase pgp_num in increments of 8 assuming one of the original
PGs is affected at a time. But my assumption may be wrong again

I could not find something relevant in the documentation

Thank you
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx