Re: Increasing number of PGs by not a factor of two?

Jesus Cea <jcea@xxxxxxx> · Fri, 25 May 2018 18:28:23 +0200

On 17/05/18 20:36, David Turner wrote:
> By sticking with PG numbers as a base 2 number (1024, 16384, etc) all of
> your PGs will be the same size and easier to balance and manage.  What
> happens when you have a non base 2 number is something like this.  Say
> you have 4 PGs that are all 2GB in size.  If you increase pg(p)_num to
> 6, then you will have 2 PGs that are 2GB and 4 PGs that are 1GB as
> you've split 2 of the PGs into 4 to get to the 6 total.  If you increase
> the pg(p)_num to 8, then all 8 PGs will be 1GB.  Depending on how you
> manage your cluster, that doesn't really matter, but for some methods of
> balancing your cluster, that will greatly imbalance things.

So, if I understand correctly, ceph tries to do the minimum splits. If
you increase PG from 8 to 12, it will split 4 PGs and leave the other 4
PGs alone, creating an imbalance.

According to that, would be far more advisable to create the pool with
12 PGs from the very beginning.

If I understand correctly, then, the advice of "power of two" is an
oversimplification. The real advice would be: you better double your PG
when you increase the PG count. That is: 12->24->48->96... Not real need
for power of two.

Also, a bad split is not important if the pool creates/destroys objects
constantly, because new objects will be spread evenly. This could be an
approach to rebalance a badly expanded pool: just copy & rename your
objects (I am thinking about cephfs).

What am I saying makes sense?.

How Ceph decide what PG to split?. Per PG object count or by PG byte size?.

Thank for your post. It deserves to be a blog!.

-- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea@xxxxxxx - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea@xxxxxxxxxx  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com