On Mon, Aug 01, 2016 at 10:37:27AM +0900, Christian Balzer wrote: > On Fri, 29 Jul 2016 16:20:03 +0800 Chengwei Yang wrote: > > > On Fri, Jul 29, 2016 at 11:47:59AM +0900, Christian Balzer wrote: > > > On Fri, 29 Jul 2016 09:59:38 +0800 Chengwei Yang wrote: > > > > > > > Hi list, > > > > > > > > I just followed the placement group guide to set pg_num for the rbd pool. > > > > > > > How many other pools do you have, or is that the only pool? > > > > Yes, this is the only one. > > > > > > > > The numbers mentioned are for all pools, not per pool, something that > > > isn't abundantly clear from the documentation either. > > > > Exactly, especially for newbie like me. :-) > > > Given how often and how LONG this issue has come up, it really needs a > rewrite and lots of BOLD sentences. > > > > > > > > " > > > > Less than 5 OSDs set pg_num to 128 > > > > Between 5 and 10 OSDs set pg_num to 512 > > > > Between 10 and 50 OSDs set pg_num to 4096 > > > > If you have more than 50 OSDs, you need to understand the tradeoffs and how to > > > > calculate the pg_num value by yourself > > > > For calculating pg_num value by yourself please take help of pgcalc tool > > > > " > > > > > > > You should have headed the hint about pgcalc, which is by far the best > > > thing to do. > > > The above numbers are an (imprecise) attempt to give a quick answer to a > > > complex question. > > > > > > > Since I have 40 OSDs, so I set pg_num to 4096 according to the above > > > > recommendation. > > > > > > > > However, after configured pg_num and pgp_num both to 4096, I found that my > > > > ceph cluster in **HEALTH_WARN** status, which does surprised me and still > > > > confusing me. > > > > > > > PGcalc would recommend 2048 PGs at most (for a single pool) with 40 OSDs. > > > > BTW, I read PGcal and found that it may also has some flaw as it says: > > > > " > > If the value of the above calculation is less than the value of (OSD#) / (Size), > > then the value is updated to the value of ((OSD#) / (Size)). This is to ensure > > even load / data distribution by allocating at least one Primary or Secondary PG > > to every OSD for every Pool. > > " > > > > However, in the above **OpenStack w RGW** use case, there are a lot of small > > pool with 32 PG that apparently smaller than OSD / Size(100/3 ~= 33.33). > > > > I do mean it though it's not smaller a lot. :-) > > > > Well, there are always trade-offs to "automatic" solutions like this when > operating either small or large clusters. > > While the goal of distributing pools amongst all OSDs is commendable, it > is also not going to be realistic in all cases. > > Nor is it typically necessary, since a small (data size) pool is > supposedly going to see less activity than a larger one, so the amount of > IOPS (# of OSDs) is going to be lower, too. > > In cases where that might not true (CephFS metadata comes to mind), > putting such a pool on SSD based OSDs might be the better choice than > increasing PGs on HDD based OSDs. > > Or if you have a large (data size) pool that is being used for something > like backups and sees very little activity, give that one less PGs than > you'd normally do and give those PGs to more active ones. Thanks, it's much clear now for me. > > It boils down to the "understanding" part. > > > > > > > I assume the above high number of 4096 stems from the wisdom that with > > > small clusters more PGs than normally recommended (100 per OSD) can be > > > helpful. > > > It was also probably written before those WARN calculations were added to > > > Ceph. > > > > > > The above would better read: > > > --- > > > Use PGcalc! > > > [...] > > > Between 10 and 20 OSDs set pg_num to 1024 > > > Between 20 and 40 OSDs set pg_num to 2048 > > > > > > Over 40 definitely use and understand PGcalc. > > > --- > > > > > > > > cluster bf6fa9e4-56db-481e-8585-29f0c8317773 > > > > health HEALTH_WARN > > > > too many PGs per OSD (307 > max 300) > > > > > > > > I see the cluster also says "4096 active+clean" so it's safe, but I do not like > > > > the HEALTH_WARN in anyway. > > > > > > > You can ignore it, but yes, it is annoying. > > > > > > > As I know(from ceph -s output), the recommended pg_num per OSD is [30, 300], any > > > > other pg_num out of this range with bring cluster to HEALTH_WARN. > > > > > > > > So what I would like to say: is the document misleading? Should we fix it? > > > > > > > Definitely. > > > > OK, I'd like to submit a PR. > > > Go right ahead and don't look at me, I'm not working for Red Hat, Ceph. ^o^ > > > Christian > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ -- Thanks, Chengwei
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com