Re: Ceiling on number of PGs in a OSD

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Fri, 20 Mar 2015 14:53:29 -0700

This isn't a hard limit on the number, but it's recommended that you keep it around 100.  Smaller values cause data distribution evenness problems. Larger values cause the OSD processes to use more CPU, RAM, and file descriptors, particularly during recovery.  With that many OSDs, you're going to want to increase your sysctl's, particularly open file descriptors, open sockets, FDs per process, etc.

You don't need the same number of placement groups for every pool.  Pools without much data don't need as many PGs.  For example, I have a bunch of pools for RGW zones, and they have 32 PGs each.  I have a total of 2600 PGs, 2048 are in the .rgw.buckets pool.

Also keep in mind that your pg_num and pgp_num need to be multipled by the number of replicas to get the PG per OSD count.  I have 2600 PGs and replication 3, so I really have 7800 PGs spread over 72 OSDs.

Assuming you have one "big" pool, 750 OSDs, and replication 3, I'd go with 32k PGs on the big pool.  Same thing, but replication 2, I'd still go 32k, but prepare to expand PGs with your next addition of OSDs.

If you're going to have several "big" pools (ie, you're using RGW and RDB heavily), I'd go with 16k PGs for the big pools, and adjust those over time depending on which is used more heavily.  If RDB is consuming 2x the space, then increase it's pg_num and pgp_num during the next OSD expansion, but don't increase RGWs pg_num and pgp_num.

The number of PGs per OSD should stay around 100 as you add OSDs.  If you add 10x the OSDs, you'll multiple the pg_num and pgp_num by 10 too, which gives you the same number of PGs per OSD.  My (pg_num / osd_num) fluctuates between 75 and 200, depending on when I do the pg_num and pgp_num increase relative to the OSD adds.

When you increase pg_num and pgp_num, don't do a large jump.  Ceph will only allow you to double the value.  Even that is extreme.  It will cause every OSD in the cluster to start splitting PGs.  When you want to double your pg_num and pgp_num, it's recommended that you make several passes.  I don't recall seeing any recommendations, but I'm planning to break my next increase up into 10 passes.  I'm at 2048 now, so I'll probably add 204 PGs until I get to 4096.

On Thu, Mar 19, 2015 at 6:12 AM, Sreenath BH <bhsreenath@xxxxxxxxx> wrote:
Hi,

Is there a celing on the number for number of placement groups in a

OSD beyond which steady state and/or recovery performance will start

to suffer?

Example: I need to create a pool with 750 osds (25 OSD per server, 50 servers).

The PG calculator gives me 65536 placement groups with 300 PGs per OSD.

Now as the cluster expands, the number of PGs in a OSD has to increase as well.

If the cluster size inceases by a factor of 10, the number of PGs per

OSD will also need to be increased.

What would be the impact of large pg number in a OSD on peering and rebalancing.

There is 3GB per OSD available.

thanks,

Sreenath

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com