On 02/14/2013 09:12 AM, Travis Rhoden wrote:
Hi folks, Looking at the docs at [1], I see the following advice: "When using multiple data pools for storing objects, you need to ensure that you balance the number of placement groups per pool with the number of placement groups per OSD so that you arrive at a reasonable total number of placement groups that provides reasonably low variance per OSD without taxing system resources or making the peering process too slow." Can someone expound on this a little bit more for me? Does it mean that if I am going to create 3 or 4 pools, all being used heavily, that perhaps I should *not* go with the recommended value of PG = (#OSDs * 100)/replicas? For example, I have 60 OSDs. With two replicas, that gives me 3000 PGs. I have read that there may be some benefit to using a power of two, so I was considering making this 4096. If I do this for 3 or 4 pools, is that too much? That's what I"m really missing -- how to know when my balance is off and I've really set up too many PGs, or too many PGs per OSD.
I've been creating 6 pools with 8192 PGs each and have been doing fine with a single mon. Going up to 8 pools with 16384 PGs each causes issues ranging from PG creation taking 10-15 minutes, ceph and rados commands hanging for minutes at a time, connections to the mons timing out, and high mon CPU usage. It's possible that increasing the number of mons might improve this. I think you'll be fine with 3-4 pools with 4k PGs each, but be aware that there are upper limits.
Also be aware that currently each pool will end up with very similar distribution of PGs, so unfortunately you won't get more randomness with more pools. 4 pools with 4k PGs each will show similar overall PG distribution as 1 pool with 4k PGs. I think we've got plans to fix that at some point.
Somewhat related -- I have one Ceph cluster that is unlikely to ever use CephFS. As such, I don't need the metadata pool at all. Is it safe to delete? That would regain me some PGs, and could lighten the load during the peering process, I suppose. Thanks, - Travis [1] http://ceph.com/docs/master/rados/operations/placement-groups/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com