Re: Is it safe to increase pg numbers in a production environment

Haomai Wang <haomaiwang@xxxxxxxxx> · Tue, 4 Aug 2015 22:01:00 +0800

On Mon, Aug 3, 2015 at 4:05 PM, 乔建峰 <scaleqiao@xxxxxxxxx> wrote:
> [Including ceph-users alias]
>
> 2015-08-03 16:01 GMT+08:00 乔建峰 <scaleqiao@xxxxxxxxx>:
>>
>> Hi Cephers,
>>
>> Currently, I'm experiencing an issue which suffers me a lot, so I'm
>> writing to ask for your comments/help/suggestions. More details are provided
>> bellow.
>>
>> Issue:
>> I set up a cluster having 24 OSDs and created one pool with 1024 placement
>> groups on it for a small startup company. The number 1024 was calculated per
>> the equation (OSDs * 100)/pool size. The cluster have been running quite
>> well for a long time. But recently, our monitoring system always complains
>> that some disks' usage exceed 85%. I log into the system and find out that
>> some disks' usage are really very high, but some are not(less than 60%).
>> Each time when the issue happens, I have to manually re-balance the
>> distribution. This is a short-term solution, I'm not willing to do it all
>> the time.
>>
>> Two long-term solutions come in my mind,
>> 1) Ask the customers to expand their clusters by adding more OSDs. But I
>> think they will ask me to explain the reason of the imbalance data
>> distribution. We've already done some analysis on the environment, we
>> learned that the most imbalance part in the CRUSH is the mapping between
>> object and pg. The biggest pg has 613 objects, while the smallest pg only
>> has 226 objects.
>>
>> 2) Increase the number of placement groups. It can be of great help for
>> statistically uniform data distribution, but it can also incur significant
>> data movement as PGs are effective being split. I just cannot do it in our
>> customers' environment before we 100% understand the consequence. So anyone
>> did this under a production environment? How much does this operation affect
>> the performance of Clients?
>>
>> Any comments/help/suggestions will be highly appreciated.

Of course not, pg split isn't a recommend process for running cluster.
It will block the client IO totally. Instead of recovering process
which will make object level control, split is a pg-level process and
osd itself can't control it smoothly. In theory if we need to make pg
split work at real cluster, we need to do more things at MON and lots
of logic will make trouble. Although we can't enjoy the flexible via
pg split, we can get the same result from *pool* with a little user
management logics.

"pool" is good thing which can cover your need. Most users always like
to have one pool for the whole cluster, it's fine for immutable
cluster but not good for a flexible cluster I think. For example, if
double osd nodes, create a new pool is a better way than preparing a
pool with lots of pgs at a very beginning.  If using openstack,
cloudstack or else, these cloud projects can provide with upper
control with "volume_type".

In a word, we can enjoy increasing osds with a relatively small
account. But I think we can't feel free to double the ceph cluster and
hoping ceph could do it perfectly.

>>
>> --
>> Best Regards
>> Jevon
>
>
>
>
> --
> Best Regards
> Jevon
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com