Re: PG number per OSD

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sat, 5 Sep 2020 11:00:21 -0700

One factor is RAM usage, that was IIRC the motivation for the lowering of the recommendation of the ratio from 200 to 100.  Memory needs also increase during recovery and backfill.

When calculating, be sure to consider repllicas.

ratio = (pgp_num x replication) / num_osds

As HDDs grow the interface though isn’t becoming faster (with SATA at least), and there are only so many IOPS and MB/s that you’re going to get out of one no matter how you slice it.  Everything always depends on your use-case and workload, but I suspect that often the bottleneck is the drive, not PG or OSD serialization.

For example, do you prize IOPS more, latency, or MB/s?  If you don’t care about latency, then you can drive your HDDs harder and get more MB/s throughput out of them, though your average latency might climb to 100ms.  Which eg. RBD VM clients probably wouldn’t be too happy about, but which an object service *might* tolerate.

Basically in the absence of more info, I would personally suggest aiming at the 150-200 average range, with pgp_num a power of 2.  If you aim a bit high, the ratio will come down a bit when you add nodes/OSDs to your cluster to gain capacity.  Be sure to balance usage and watch your mon_max_pg_per_osd setting — allowing some headroom for natural variation and for when components fail.

YMMV.  

— aad

> On Sep 5, 2020, at 10:34 AM, huxiaoyu@xxxxxxxxxxxx wrote:
> 
> Dear Ceph folks,
> 
> As the capacity of one HDD (OSD) is growing bigger and bigger, e.g. from 6TB up to 18TB or even more, should the number of PG per OSD increase as well, e.g. for 200 to 800. As far as i know, the capacity of each PG should be set smaller for performance reasons due to the existence of PG locks, thus shall i set the number of PGs per OSD to 1000 or even 2000?  what is the actual reason for not setting the number of PGs per OSD? Is there any practical limations on the number of PGs?
> 
> thanks a lot,
> 
> Samuel 
> 
> 
> 
> 
> huxiaoyu@xxxxxxxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx