> Op 16 mei 2016 om 7:56 schreef Chris Dunlop <chris@xxxxxxxxxxxx>: > > > Hi, > > I'm trying to understand the potential impact on an active cluster of > increasing pg_num/pgp_num. > > The conventional wisdom, as gleaned from the mailing lists and general > google fu, seems to be to increase pg_num followed by pgp_num, both in > small increments, to the target size, using "osd max backfills" (and > perhaps "osd recovery max active"?) to control the rate and thus > performance impact of data movement. > > I'd really like to understand what's going on rather than "cargo culting" > it. > > I'm currently on Hammer, but I'm hoping the answers are broadly applicable > across all versions for others following the trail. > > Why do we have both pg_num and pgp_num? Given the docs say "The pgp_num > should be equal to the pg_num": under what circumstances might you want > these different, apart from when actively increasing pg_num first then > increasing pgp_num to match? (If they're supposed to be always the same, why > not have a single parameter and do the "increase pg_num, then pgp_num" > within ceph's internals?) > pg_num is the actual amount of PGs. This you can increase without any actual data moving. pgp_num is the number CRUSH uses in the calculations. pgp_num can't be greater than pg_num for that reason. You can slowly increase pgp_num to make sure not all your data moves at the same time. > What do "osd backfill scan min" and "osd backfill scan max" actually > control? The docs say "The minimum/maximum number of objects per backfill > scan" but what does this actually mean and how does it affect the impact (if > at all)? > The less objects is scans at once, the less I/O it causes. I don't play with those values to much. > Is "osd recovery max active" actually relevant to this situation? It's > mentioned in various places related to increasing pg_num/pgp_num but my > understanding is it's related to recovery (e.g. osd falls out and comes > back again and needs to catch up) rather than back filling (migrating > pgs misplaced due to increasing pg_num, crush map changes etc.) > > Previously (back in Dumpling days): > > ---- > http://article.gmane.org/gmane.comp.file-systems.ceph.user/11490 > ---- > From: Gregory Farnum > Subject: Re: Throttle pool pg_num/pgp_num increase impact > Newsgroups: gmane.comp.file-systems.ceph.user > Date: 2014-07-08 17:01:30 GMT > > On Tuesday, July 8, 2014, Kostis Fardelas wrote: > > Should we be worried that the pg/pgp num increase on the bigger pool will > > have a 300X larger impact? > > The impact won't be 300 times bigger, but it will be bigger. There are two > things impacting your cluster here > > 1) the initial "split" of the affected PGs into multiple child PGs. You can > mitigate this by stepping through pg_num at small multiples. > 2) the movement of data to its new location (when you adjust pgp_num). This > can be adjusted by setting the "OSD max backfills" and related parameters; > check the docs. > -Greg > ---- > > Am I correct thinking "small multiples" in this context is along the lines > of "1.1" rather than "2" or "4"?. > > Is there really much impact when increasing pg_num in a single large step > e.g. 1024 to 4096? If so, what causes this impact? An initial trial of > increasing pg_num by 10% (1024 to 1126) on one of my pools showed it > completed in a matter of tens of seconds, too short to really measure any > performance impact. But I'm concerned this could be exponential to the size > of the step such that increasing by a large step (e.g. the rest of the way > from 1126 to 4096) could cause problems. > > Given the use of "osd max backfills" to limit the impact of the data > movement associated with increasing pgp_num, is there any advantage or > disadvantage to increasing pgp_num in small increments (e.g. 10% at a time) > vs "all at once", apart from small increments likely moving some data > multiple times? E.g. with a large step is there a higher potential for > problems if something else happens to the cluster the same time (e.g. an OSD > dies) because the current state of the system is further from the expected > state, or something like that? > > If small increments of pgp_num are advisable, should the process be > "increase pg_num by a small increment, increase pgp_num to match, repeat > until target reached", or is that no advantage to increasing pg_num (in > multiple small increments or single large step) to the target, then > increasing pgp_num in small increments to the target - and why? > > Given that increasing pg_num/pgp_num seem almost inevitable for a growing > cluster, and that increasing these can be one of the most > performance-impacting operations you can perform on a cluster, perhaps a > document going into these details would be appropriate? > > Cheers, > > Chris > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com