In our unofficial testing, under heavy random 4KB write workloads, with large PGs, we observed large latency such as 100ms or above. On the other hand, when peering at the source code, it seems that PG lock could impact performance if the capacity of PGs grows bigger That is why i am wondering what is the fundametal limitations on PG numbers per OSD best regards, Samuel huxiaoyu@xxxxxxxxxxxx From: Dan van der Ster Date: 2020-09-05 20:22 To: huxiaoyu@xxxxxxxxxxxx CC: ceph-users; Mark Nelson Subject: Re: PG number per OSD Good question! Did you already observe some performance impact of very large PGs? Which PG locks are you speaking of? Is there perhaps some way to improve this with the op queue shards? (I'm cc'ing Mark in case this is something that the performance team has already looked into). With a 20TB osd, we'll have up to 200GB PGs following the current suggestions -- but even then, backfilling those huge PGs would still be done in under an hour, which seems pretty reasonable IMHO. -- dan On Sat, Sep 5, 2020 at 7:35 PM huxiaoyu@xxxxxxxxxxxx <huxiaoyu@xxxxxxxxxxxx> wrote: > > Dear Ceph folks, > > As the capacity of one HDD (OSD) is growing bigger and bigger, e.g. from 6TB up to 18TB or even more, should the number of PG per OSD increase as well, e.g. for 200 to 800. As far as i know, the capacity of each PG should be set smaller for performance reasons due to the existence of PG locks, thus shall i set the number of PGs per OSD to 1000 or even 2000? what is the actual reason for not setting the number of PGs per OSD? Is there any practical limations on the number of PGs? > > thanks a lot, > > Samuel > > > > > huxiaoyu@xxxxxxxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx