Re: Placement Group Question !

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 19 Feb 2013 10:27:54 -0800

On Mon, Feb 18, 2013 at 7:00 AM, femi anjorin <femi.anjorin@xxxxxxxxx> wrote:
> Hi,
>
> Pls can somebody help with two questions:
>
> 1.  I have 96 OSDs and 18624 pgs. ceph created the placement groups it
> self when i issued the mkcephfs command initially.
>
> I dont know how ceph arrived at that number of pgs? I felt the system
> will use the formula in this reference
> http://ceph.com/docs/master/rados/operations/placement-groups/ ,
> average of 100 per OSD but apparently there might be some other
> factors considered?

This is just because it's using a very basic left-shift on the number
of OSDs to arrive at the number of pools; the number of PGs is only a
rough guide anyway.

> 2. What will be a good/ideal placement group number for  data pool?
> and metadata pool? in my cluster with 96 OSDs?

This is still a bit more complicated than it should be. I've
reproduced my answer to a similar question below (since unfortunately
ceph-users still isn't showing up in gmane):
Unfortunately, the number of PGs isn't really this cut and dry. The
recommendation for 100 per OSD is based on statistical tests of the
evenness of the data distribution across the cluster, but those tests
were all run when using only one pool in the cluster. If your pools
all see roughly the same amount of usage /and they had uncorrelated PG
placements/ then this distribution would roughly maintain itself if
you split up that 100 PGs/OSD across multiple pools. Unfortunately as
Sage mentioned the PG placements are currently correlated (whoops!).
Now, your OSDs should be able to handle quite a lot more than 100
PGs/OSD — Sam guesstimates that (modulo weird hardware configs) you
don't really run into trouble until each OSD is hosting in the
neighborhood of 5000 PGs (so 1600-2500 PGs/OSD with 3x or 2x
replication), so I'd bias for a per-pool count which is close to
100/OSD and then reduce if necessary to avoid getting ridiculous in
your total number.
Of course, the long-term vision once the PG functionality for merge is
written and split is a bit more baked is that the cluster will
auto-scale your PG counts based on the quality of the data
distribution and the amount of data in the PG.

Hope this clarifies the tradeoffs you're making a bit more! :)
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com