Re: "too many PGs per OSD" in Hammer

Stuart Longland <stuartl@xxxxxxxxxx> · Thu, 07 May 2015 08:48:28 +1000

On 07/05/15 07:53, Chris Armstrong wrote:
> Thanks for the feedback. That language is confusing to me, then, since
> the first paragraph seems to suggest using a pg_num of 128 in cases
> where we have less than 5 OSDs, as we do here.
> 
> The warning below that is: "As the number of OSDs increases, chosing the
> right value for pg_num becomes more important because it has a
> significant influence on the behavior of the cluster as well as the
> durability of the data when something goes wrong (i.e. the probability
> that a catastrophic event leads to data loss).", which suggests that
> this could be an issue with more OSDs, which doesn't apply here.
> 
> Do we know if this warning is calculated based on the resources of the
> host? If I try with larger machines, will this warning change?

I'd be interested in an answer here too.  I just did an update from
Giant to Hammer and struck the same dreaded error message.

When I initially deployed Ceph (with Emperor), I worked out according to
the formula given on the site:

>     # We have: 3 OSD nodes with 2 OSDs each
>     # giving us 6 OSDs total.
>     # There are 3 replicas, so the recommended number of
>     # placement groups is:
>     #      6 * 100 / 3
>     # which gives: 200 placement groups.
>     # Rounding this up to the nearest power of two gives:
>     osd pool default pg num = 256
>     osd pool default pgp num = 256

It seems this was a bad value to use.  I now have a problem of a biggish
lump of data sitting in a pool with an inappropriate number of placement
groups.  It seems I needed to divide this number by the number of pools.

For now I've shut it up with the following:

> [mon]
>     mon warn on legacy crush tunables = false
>     # New warning on move to Hammer
>     mon pg warn max per osd = 2048

Question is, how does one go about fixing this?  I'd rather not blow
away production pools just at this point although right now we only have
one major production load, so if we're going to do it at any time, now
is the time to do it.

Worst bit is this will probably change: so I can see me hitting this
problem time and time again as a new pool is added some time later.

Is there a way of tuning the number of placement groups without
destroying data?

Regards,
-- 
     _ ___             Stuart Longland - Systems Engineer
\  /|_) |                           T: +61 7 3535 9619
 \/ | \ |     38b Douglas Street    F: +61 7 3535 9699
   SYSTEMS    Milton QLD 4064       http://www.vrt.com.au
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com