Re: Tuning placement group

Sage Weil <sage@xxxxxxxxxxx> · Fri, 20 Jul 2012 12:43:13 -0700 (PDT)

On Fri, 20 Jul 2012, Fran?ois Charlier wrote:
> Hello,
> 
> Reading    http://ceph.com/docs/master/ops/manage/grow/placement-groups/
> and thinking to build a ceph cluster with potentially 1000 OSDs.
> 
> Using the recommandations on the previously cited link, it would require
> pg_num being set between 10,000 &  30,000. Okay with that. Let's use the
> recommended value of 16,384 ; this  is alreay about 160 placement groups
> per OSD.

I think you mean (16384 * 3x) / 1000 osds ~= 50 pgs per osd?

> What  if, for  a start,  we choose  to reach  this number  of 1000  OSDs
> slowly, starting with 100 OSDs ? It's now 1600 placement groups per OSD.

~500

> What if  we chose 30,000 (or  32,768) placement groups to  keep room for
> expansion ?

~1000

> My question  is : How will  behave a Ceph  pool with 1000, 5000  or even
> 10000 placement groups per OSD ?  Will this impact performance ? How bad
> ? Can it be worked around ? Is this a problem of RAM size ? CPU usage ?
> 
> Any hint about this would be much appreciated.

It will work, but peering will be slower, and there will be more memory 
used.

The other question is when you expect to move beyond 1000 osds.  The next 
project we'll be doing on the OSD is PG splitting, which will make this 
problem adjustable.  It won't be backported to argonaut, but it will be in 
the next stable release, and will probably appear in our regular 
development release in 2-3 months.

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html