Re: [ceph-users] Is it safe to increase pg number in a production environment

Ketor D <d.ketor@xxxxxxxxx> · Wed, 5 Aug 2015 03:16:48 +0800

Hi Stefan,
      Could you describe more about the linger ops bug?
      I'm runing Firefly as you say still has this bug.

Thanks!

On Wed, Aug 5, 2015 at 12:51 AM, Stefan Priebe <s.priebe@xxxxxxxxxxxx> wrote:
> We've done the splitting several times. The most important thing is to run a
> ceph version which does not have the linger ops bug.
>
> This is dumpling latest release, giant and hammer. Latest firefly release
> still has this bug. Which results in wrong watchers and no working
> snapshots.
>
> Stefan
>
> Am 04.08.2015 um 18:46 schrieb Samuel Just:
>>
>> It will cause a large amount of data movement.  Each new pg after the
>> split will relocate.  It might be ok if you do it slowly.  Experiment
>> on a test cluster.
>> -Sam
>>
>> On Mon, Aug 3, 2015 at 12:57 AM, 乔建峰 <scaleqiao@xxxxxxxxx> wrote:
>>>
>>> Hi Cephers,
>>>
>>> This is a greeting from Jevon. Currently, I'm experiencing an issue which
>>> suffers me a lot, so I'm writing to ask for your
>>> comments/help/suggestions.
>>> More details are provided bellow.
>>>
>>> Issue:
>>> I set up a cluster having 24 OSDs and created one pool with 1024
>>> placement
>>> groups on it for a small startup company. The number 1024 was calculated
>>> per
>>> the equation 'OSDs * 100'/pool size. The cluster have been running quite
>>> well for a long time. But recently, our monitoring system always
>>> complains
>>> that some disks' usage exceed 85%. I log into the system and find out
>>> that
>>> some disks' usage are really very high, but some are not(less than 60%).
>>> Each time when the issue happens, I have to manually re-balance the
>>> distribution. This is a short-term solution, I'm not willing to do it all
>>> the time.
>>>
>>> Two long-term solutions come in my mind,
>>> 1) Ask the customers to expand their clusters by adding more OSDs. But I
>>> think they will ask me to explain the reason of the imbalance data
>>> distribution. We've already done some analysis on the environment, we
>>> learned that the most imbalance part in the CRUSH is the mapping between
>>> object and pg. The biggest pg has 613 objects, while the smallest pg only
>>> has 226 objects.
>>>
>>> 2) Increase the number of placement groups. It can be of great help for
>>> statistically uniform data distribution, but it can also incur
>>> significant
>>> data movement as PGs are effective being split. I just cannot do it in
>>> our
>>> customers' environment before we 100% understand the consequence. So
>>> anyone
>>> did this under a production environment? How much does this operation
>>> affect
>>> the performance of Clients?
>>>
>>> Any comments/help/suggestions will be highly appreciated.
>>>
>>> --
>>> Best Regards
>>> Jevon
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html