Re: Setting correct PG num with multiple pools in play

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/14/2013 09:12 AM, Travis Rhoden wrote:
Hi folks,

Looking at the docs at [1], I see the following advice:

"When using multiple data pools for storing objects, you need to ensure
that you balance the number of placement groups per pool with the number
of placement groups per OSD so that you arrive at a reasonable total
number of placement groups that provides reasonably low variance per OSD
without taxing system resources or making the peering process too slow."

Can someone expound on this a little bit more for me?  Does it mean that
if I am going to create 3 or 4 pools, all being used heavily, that
perhaps I should *not* go with the recommended value of PG = (#OSDs *
100)/replicas?  For example, I have 60 OSDs.  With two replicas, that
gives me 3000 PGs.  I have read that there may be some benefit to using
a power of two, so I was considering making this 4096.  If I do this for
3 or 4 pools, is that too much?  That's what I"m really missing -- how
to know when my balance is off and I've really set up too many PGs, or
too many PGs per OSD.

I've been creating 6 pools with 8192 PGs each and have been doing fine with a single mon. Going up to 8 pools with 16384 PGs each causes issues ranging from PG creation taking 10-15 minutes, ceph and rados commands hanging for minutes at a time, connections to the mons timing out, and high mon CPU usage. It's possible that increasing the number of mons might improve this. I think you'll be fine with 3-4 pools with 4k PGs each, but be aware that there are upper limits.

Also be aware that currently each pool will end up with very similar distribution of PGs, so unfortunately you won't get more randomness with more pools. 4 pools with 4k PGs each will show similar overall PG distribution as 1 pool with 4k PGs. I think we've got plans to fix that at some point.


Somewhat related -- I have one Ceph cluster that is unlikely to ever use
CephFS.  As such, I don't need the metadata pool at all.  Is it safe to
delete?  That would regain me some PGs, and could lighten the load
during the peering process, I suppose.

Thanks,

  - Travis

[1] http://ceph.com/docs/master/rados/operations/placement-groups/


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux