Re: 答复: 答复: too many PGs per OSD (307 > max 300)

Christian Balzer <chibi@xxxxxxx> · Mon, 1 Aug 2016 10:45:50 +0900

Hello,

On Fri, 29 Jul 2016 04:46:54 +0000 zhu tong wrote:

> Right, that was the one that I calculated the   osd_pool_default_pg_num in our test cluster.
> 
> 
> 7 OSD, 11 pools, osd_pool_default_pg_num is calculated to be 256, but when ceph status shows
>
Already wrong, that default is _per_ pool and would already give you way
too many PGs. 

> health HEALTH_WARN
>             too many PGs per OSD (5818 > max 300)
>      monmap e1: 1 mons at {open-kvm-app63=192.168.32.103:6789/0}
>             election epoch 1, quorum 0 open-kvm-app63
>      osdmap e143: 7 osds: 7 up, 7 in
>       pgmap v717609: 6916 pgs, 11 pools, 1617 MB data, 4577 objects
>             17600 MB used, 3481 GB / 3498 GB avail
>                 6916 active+clean
> 
> How so?
> 
Because you specified nearly 1000 PGs per pool (on average, could be of
course just one huge pool), somehow, somewhere.
As Chengwei said, look at the pg_num of your pools.

And while too many PGs isn't all that bad (they use RAM/CPU) up to a
point, in your case I'd definitely recommend starting again from scratch.

With 7 OSDs you can safely use 512 PGs (in TOTAL), so your individual
pools would be 46 PGs on average, meaning small ones would be 32 and
larger ones 64 or 128.

Getting this right with a small # of OSDs is a challenge.

Christian
> 
> Thanks.
> 
> ________________________________
> 发件人: Christian Balzer <chibi@xxxxxxx>
> 发送时间: 2016年7月29日 3:31:18
> 收件人: ceph-users@xxxxxxxxxxxxxx
> 抄送: zhu tong
> 主题: Re:  答复: too many PGs per OSD (307 > max 300)
> 
> 
> Hello,
> 
> On Fri, 29 Jul 2016 03:18:10 +0000 zhu tong wrote:
> 
> > The same problem is confusing me recently too, trying to figure out the relationship (an equation would be the best) among number of pools, OSD and PG.
> >
> The pgcalc tool and the equation on that page are your best bet/friend.
>  http://ceph.com/pgcalc/
> 
> > For example, having 10 OSD, 7 pools in one cluster, and osd_pool_default_pg_num = 128, then how many PGs the health status would show?
> > I have seen some recommended calc the other way round -- inferring osd_pool_default_pg_num  value by giving a fixed amount of OSD and PGs, but when I try it in the way above mentioned, the two results do not match.
> >
> Number of PGs per OSD is your goal.
> To use a simpler example, 20 OSDs, 4 pools, all of equal (expected amount
> of data) size.
> So that's 1024 total PGs (about 150 per OSD),  thus 256 per pool.
> 
> Again, see pgcalc.
> 
> Christian
> > Thanks.
> > ________________________________
> > 发件人: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> 代表 Christian Balzer <chibi@xxxxxxx>
> > 发送时间: 2016年7月29日 2:47:59
> > 收件人: ceph-users
> > 主题: Re:  too many PGs per OSD (307 > max 300)
> >
> > On Fri, 29 Jul 2016 09:59:38 +0800 Chengwei Yang wrote:
> >
> > > Hi list,
> > >
> > > I just followed the placement group guide to set pg_num for the rbd pool.
> > >
> > How many other pools do you have, or is that the only pool?
> >
> > The numbers mentioned are for all pools, not per pool, something that
> > isn't abundantly clear from the documentation either.
> >
> > >   "
> > >   Less than 5 OSDs set pg_num to 128
> > >   Between 5 and 10 OSDs set pg_num to 512
> > >   Between 10 and 50 OSDs set pg_num to 4096
> > >   If you have more than 50 OSDs, you need to understand the tradeoffs and how to
> > >   calculate the pg_num value by yourself
> > >   For calculating pg_num value by yourself please take help of pgcalc tool
> > >   "
> > >
> > You should have headed the hint about pgcalc, which is by far the best
> > thing to do.
> > The above numbers are an (imprecise) attempt to give a quick answer to a
> > complex question.
> >
> > > Since I have 40 OSDs, so I set pg_num to 4096 according to the above
> > > recommendation.
> > >
> > > However, after configured pg_num and pgp_num both to 4096, I found that my
> > > ceph cluster in **HEALTH_WARN** status, which does surprised me and still
> > > confusing me.
> > >
> > PGcalc would recommend 2048 PGs at most (for a single pool) with 40 OSDs.
> >
> > I assume the above high number of 4096 stems from the wisdom that with
> > small clusters more PGs than normally recommended (100 per OSD) can be
> > helpful.
> > It was also probably written before those WARN calculations were added to
> > Ceph.
> >
> > The above would better read:
> > ---
> > Use PGcalc!
> > [...]
> > Between 10 and 20 OSDs set pg_num to 1024
> > Between 20 and 40 OSDs set pg_num to 2048
> >
> > Over 40 definitely use and understand PGcalc.
> > ---
> >
> > > >   cluster bf6fa9e4-56db-481e-8585-29f0c8317773
> > >      health HEALTH_WARN
> > >             too many PGs per OSD (307 > max 300)
> > >
> > > I see the cluster also says "4096 active+clean" so it's safe, but I do not like
> > > the HEALTH_WARN in anyway.
> > >
> > You can ignore it, but yes, it is annoying.
> >
> > > As I know(from ceph -s output), the recommended pg_num per OSD is [30, 300], any
> > > other pg_num out of this range with bring cluster to HEALTH_WARN.
> > >
> > > So what I would like to say: is the document misleading? Should we fix it?
> > >
> > Definitely.
> >
> > Christian
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx    Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx    Global OnLine Japan/Rakuten Communications
> http://www.gol.com/

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com