Re: Questions about journals, performance and disk utilization.

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Tue, 22 Jan 2013 16:11:39 -0600

On 01/22/2013 04:00 PM, Gregory Farnum wrote:
On Tuesday, January 22, 2013 at 1:57 PM, Mark Nelson wrote:
On 01/22/2013 03:50 PM, Stefan Priebe wrote:
Hi,
Am 22.01.2013 22:26, schrieb Jeff Mitchell:
Mark Nelson wrote:
It may (or may not) help to use a power-of-2 number of PGs. It's
generally a good idea to do this anyway, so if you haven't set up your
production cluster yet, you may want to play around with this. Basically
just take whatever number you were planning on using and round it up (or
down slightly). IE if you were going to use 7,000 PGs, round up to 8192.

As I was asking about earlier on IRC, I'm in a situation where the docs
did not mention this in the section about calculating PGs so I have a
non-power-of-2 -- and since there are some production things running on
that pool I can't currently change it.

Oh same thing here - did i miss the doc or can someone point me the
location.

Is there a chance to change the number of PGs for a pool?

Greets,
Stefan

Honestly I don't know if it will actually have a significant effect.
ceph_stable_mod will map things optimally when pg_num is a power of 2,
but that's only part of how things work. It may not matter very much
with high PG counts.

IIRC, having a non-power of 2 count means that the extra PGs (above the lower-bounding power of 2) will be twice the size of the other PGs. For reasonable PG counts this should not cause any problems.
-Greg

Hrm, for some reason I thought there was more to it than that.  I 
suppose then you really are just at the mercy then of the distribution 
of big PGs vs small PGs on each OSD.

A while back I was talking to Sage about doing something like (forgive 
the python):

def ceph_stable_mod2(x, b, bmask):
     if ((x & bmask) < b):
         return x & bmask
     else:
         return x % b

but that doesn't give as nice splitting behaviour.  Still, unless I'm 
missing something, isn't splitting kind of a rare event anyway?

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html