Re: Is it safe to increase pg number in a production environment

Jan Schermer <jan@xxxxxxxxxxx> · Tue, 4 Aug 2015 19:23:20 +0200

I think I wrote about my experience with this about 3 months ago, including what techniques I used to minimize impact on production.

Basicaly we had to
1) increase pg_num in small increments only, bcreating the placement groups themselves caused slowed requests on OSDs
2) increse pgp_num in small increments and then go higher

We went from 4096 placement groups up to 16384

pg_num (the number of on-disk created placement groups) was increased like this:
# for i in `seq 4096 64 16384` ; do ceph osd pool set $pool pg_num $i ; sleep 60 ; done
this ran overnight (and was upped to 128 step during the night)

Increasing pgp_num was trickier in our case, first because it was heavy production and we wanted to minimize the visible impact and second because of wildly differing free space on the OSDs.
We did it again in steps and waited for the cluster to settle before continuing.
Each step upped pgp_num by about 2% and as we got higher (>8192) we increased this to much more - the last step was 15360->16384 with the same impact the initial 4096->4160 had.

The end result is much better but still nowhere near optimal - bigger impact would be upgrading to a newer Ceph release and setting the new tunables because we’re running Dumpling.

Be aware that PGs cost some space (rough estimate is 5GB per OSD in our case), and also quite a bit of memory - each OSD has 1.7-2.0GB RSS right now while it only had about 1GB before. That’s a lot of memory and space with higher OSD counts...

And while I haven’t calculated the number of _objects_ per PG, but we have differing numbers of _placement_groups_ per OSD (one OSD hosts 500, another hosts 1300) and this seems to be the cause of poor data balancing.

Jan

> On 04 Aug 2015, at 18:52, Marek Dohojda <mdohojda@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> I have done this not that long ago.  My original PG estimates were wrong and I had to increase them.  
> 
> After increasing the PG numbers the Ceph rebalanced, and that took a while.  To be honest in my case the slowdown wasn’t really visible, but it took a while.  
> 
> My strong suggestion to you would be to do it in a long IO time, and be prepared that this willl take quite a long time to accomplish.  Do it slowly  and do not increase multiple pools at once. 
> 
> It isn’t recommended practice but doable.
> 
> 
> 
>> On Aug 4, 2015, at 10:46 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> 
>> It will cause a large amount of data movement.  Each new pg after the
>> split will relocate.  It might be ok if you do it slowly.  Experiment
>> on a test cluster.
>> -Sam
>> 
>> On Mon, Aug 3, 2015 at 12:57 AM, 乔建峰 <scaleqiao@xxxxxxxxx> wrote:
>>> Hi Cephers,
>>> 
>>> This is a greeting from Jevon. Currently, I'm experiencing an issue which
>>> suffers me a lot, so I'm writing to ask for your comments/help/suggestions.
>>> More details are provided bellow.
>>> 
>>> Issue:
>>> I set up a cluster having 24 OSDs and created one pool with 1024 placement
>>> groups on it for a small startup company. The number 1024 was calculated per
>>> the equation 'OSDs * 100'/pool size. The cluster have been running quite
>>> well for a long time. But recently, our monitoring system always complains
>>> that some disks' usage exceed 85%. I log into the system and find out that
>>> some disks' usage are really very high, but some are not(less than 60%).
>>> Each time when the issue happens, I have to manually re-balance the
>>> distribution. This is a short-term solution, I'm not willing to do it all
>>> the time.
>>> 
>>> Two long-term solutions come in my mind,
>>> 1) Ask the customers to expand their clusters by adding more OSDs. But I
>>> think they will ask me to explain the reason of the imbalance data
>>> distribution. We've already done some analysis on the environment, we
>>> learned that the most imbalance part in the CRUSH is the mapping between
>>> object and pg. The biggest pg has 613 objects, while the smallest pg only
>>> has 226 objects.
>>> 
>>> 2) Increase the number of placement groups. It can be of great help for
>>> statistically uniform data distribution, but it can also incur significant
>>> data movement as PGs are effective being split. I just cannot do it in our
>>> customers' environment before we 100% understand the consequence. So anyone
>>> did this under a production environment? How much does this operation affect
>>> the performance of Clients?
>>> 
>>> Any comments/help/suggestions will be highly appreciated.
>>> 
>>> --
>>> Best Regards
>>> Jevon
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com