Re: Setting correct PG num with multiple pools in play

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, Feb 14, 2013 at 12:21 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> On Thu, 14 Feb 2013, Travis Rhoden wrote:
>> Hi folks,
>>
>> Looking at the docs at [1], I see the following advice:
>>
>> "When using multiple data pools for storing objects, you need to ensure that
>> you balance the number of placement groups per pool with the number of
>> placement groups per OSD so that you arrive at a reasonable total number of
>> placement groups that provides reasonably low variance per OSD without
>> taxing system resources or making the peering process too slow."
>>
>> Can someone expound on this a little bit more for me?  Does it mean that if
>> I am going to create 3 or 4 pools, all being used heavily, that perhaps I
>> should *not* go with the recommended value of PG = (#OSDs * 100)/replicas?
>> For example, I have 60 OSDs.  With two replicas, that gives me 3000 PGs.  I
>> have read that there may be some benefit to using a power of two, so I was
>> considering making this 4096.  If I do this for 3 or 4 pools, is that too
>> much?  That's what I"m really missing -- how to know when my balance is off
>> and I've really set up too many PGs, or too many PGs per OSD.
>
> That "PG" should probably read "total PGs".  So, device by 3 or 4.
>
> Unfortunately, though, there is a <facepalm> in the placement code that
> makes the placement of PGs for different pools overlap heavily; that will
> get fixed in cuttlefish.  So if the cluster is large, the data
> distribution will degrade somewhat if there are lots of overlapping pools.
> For now, I would recommend splitting the difference.
>
Ah, interesting!  I definitely did not pick up that that formula was giving you a target number for total PGs in the system, not per pool.  If that is the case, though, I have to question how the default sizes get picked when using mkcephfs.  In my 60 OSD example, the recommended number of PGs per the docs would be 3000, and indeed mkcephfs made the 3 default pools (2 copies) fairly close to that -- 3904.  But that is per-pool, and the overall number of PGs out of the box was 11712.  Based on your feedback above, isn't that a little high?

I had already added two more pools with 3904 PGs each, and just added another with 4096.  That brings my total PG num to 23616 (almost 400 per OSD).  Hearing that the "total" PG count should be more like 3000 makes me worried that I have a lot of unnecessary overhead.  Thoughts?  Am I interpreting all this correctly?

Thanks for the all the great info.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux