Re: Is it safe to increase pg number in a production environment

Marek Dohojda <mdohojda@xxxxxxxxxxxxxxxxxxx> · Tue, 4 Aug 2015 10:52:35 -0600

I have done this not that long ago.  My original PG estimates were wrong and I had to increase them.  

After increasing the PG numbers the Ceph rebalanced, and that took a while.  To be honest in my case the slowdown wasn’t really visible, but it took a while.  

My strong suggestion to you would be to do it in a long IO time, and be prepared that this willl take quite a long time to accomplish.  Do it slowly  and do not increase multiple pools at once. 

It isn’t recommended practice but doable.

> On Aug 4, 2015, at 10:46 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
> 
> It will cause a large amount of data movement.  Each new pg after the
> split will relocate.  It might be ok if you do it slowly.  Experiment
> on a test cluster.
> -Sam
> 
> On Mon, Aug 3, 2015 at 12:57 AM, 乔建峰 <scaleqiao@xxxxxxxxx> wrote:
>> Hi Cephers,
>> 
>> This is a greeting from Jevon. Currently, I'm experiencing an issue which
>> suffers me a lot, so I'm writing to ask for your comments/help/suggestions.
>> More details are provided bellow.
>> 
>> Issue:
>> I set up a cluster having 24 OSDs and created one pool with 1024 placement
>> groups on it for a small startup company. The number 1024 was calculated per
>> the equation 'OSDs * 100'/pool size. The cluster have been running quite
>> well for a long time. But recently, our monitoring system always complains
>> that some disks' usage exceed 85%. I log into the system and find out that
>> some disks' usage are really very high, but some are not(less than 60%).
>> Each time when the issue happens, I have to manually re-balance the
>> distribution. This is a short-term solution, I'm not willing to do it all
>> the time.
>> 
>> Two long-term solutions come in my mind,
>> 1) Ask the customers to expand their clusters by adding more OSDs. But I
>> think they will ask me to explain the reason of the imbalance data
>> distribution. We've already done some analysis on the environment, we
>> learned that the most imbalance part in the CRUSH is the mapping between
>> object and pg. The biggest pg has 613 objects, while the smallest pg only
>> has 226 objects.
>> 
>> 2) Increase the number of placement groups. It can be of great help for
>> statistically uniform data distribution, but it can also incur significant
>> data movement as PGs are effective being split. I just cannot do it in our
>> customers' environment before we 100% understand the consequence. So anyone
>> did this under a production environment? How much does this operation affect
>> the performance of Clients?
>> 
>> Any comments/help/suggestions will be highly appreciated.
>> 
>> --
>> Best Regards
>> Jevon
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com