hi, i wanted to report back to you that splitting worked *exactly* as you described by running "ceph osd pool set default.rgw.buckets.data pg_num 32" the whole processes took approximately 2 minutes to split the placement groups and re-peer them from 8 to 32 for 10 OSDs on 5 hosts. I had an OSD crash during that time but ceph handled it gracefully. Downtime was really very minimal. I set target_max_misplaced_ratio to 3% but the misplaced objects were around 9% ( 2 active backfills and 2 waiting) which probably has to do with the fact that each osd has too many objects. thank you On Tue, Aug 3, 2021 at 4:51 PM 胡 玮文 <huww98@xxxxxxxxxxx> wrote: > > 在 2021年8月3日,21:32,Gabriel Tzagkarakis <gabrieltz@xxxxxxxxx> 写道: > > > hi , thank you for replying > > Does this method refer to manually setting the number of placement groups > while keeping autoscale_mode setting off ? > Also from what i can see from the documentation > the target_max_misplaced_ratio implies using the balancer feature, which > I am currently not using > > > I believe this “auto pgp_num increasing” feature works independently from > autoscaler and balancer. When the last time I increase pg_num to 1024, I > have autoscale mode set to warn, and balancer off. I recommend you to read > this blog. > https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/ Specifically, > near “Starting in Nautilus, this second step is no longer necessary: …” > > And target_max_misplaced_ratio is not only used in balancer, but also used > in this feature. > > If I understood correctly the existing PGs will be split in place and act > as primary for the backfills that will be required to distribute the data > evenly to all osds > > Can i use the manual way to increase slowly pgp in the pool end when my > PGs have a more manageable size i will enable the balancer. > > will there be a considerable amount of downtime splitting pgs and peering ? > > > I didn’t observe any significant downtime the last time I did this. I > think it is several seconds at most. > > I'm sorry for asking too many questions , i'm trying not to break stuff :) > > On Tue, Aug 3, 2021 at 3:46 PM 胡 玮文 <huww98@xxxxxxxxxxx> wrote: > >> Hi, >> >> >> >> Each placement group will get split in 4 pieces in-place all at nearly >> the same time, no empty pgs will be created. >> >> >> >> Normally, you only set pg_num, but do not touch pgp_num. Instead, you can >> set “target_max_misplaced_ratio” (default 5%). Then mgr will increase >> pgp_num for you. It will increase pgp_num so that some pg get placed into >> another OSD, until misplaced ratio reached target. Then it wait for some >> backfilling to finish before increasing pgp_num again. (This behavior seems >> to be introduced in Nautilus) >> >> >> >> So I don’t think you need to worry about full OSDs. “backfillfull ratio” >> should throttling backfill when OSD is nearly full, which in turn will >> throttling pgp_num increase. >> >> >> >> *发件人: *Gabriel Tzagkarakis <gabrieltz@xxxxxxxxx> >> *发送时间: *2021年8月3日 19:42 >> *收件人: *ceph-users@xxxxxxx >> *主题: * PG scaling questions >> >> >> >> hello everyone, >> >> I would like to know how does the autoscale or manual scaling actually >> works to prevent my >> cluster from running out of disk space. >> >> Let's say i want to scale a pool of 8 PGs each ~400Gb to 32 PGs. >> >> 1) does each placement group get split in 4 pieces IN-PLACE all at the >> same >> time ? >> 2) does autoscaling choose one of the existing random placement groups for >> example X.Y and >> creates new empty placement groups and migrates data upon them and then >> continues to the next big PG with or without deleting the original PG? >> 3) something else ? >> >> I am more concerned about the time period when both the >> initial/pre-existing PGs and the newly created ones co-exist in the >> cluster >> to prevent full osds. In my case each pg has many small files and deleting >> stray pgs takes a long time. >> >> Would it be better if i used something like >> ceph osd pool set default.rgw.buckets.data pg_num 32 >> and then increase pgp_num in increments of 8 assuming one of the original >> PGs is affected at a time. But my assumption may be wrong again >> >> I could not find something relevant in the documentation >> >> Thank you >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx