Re: ceph balancer: further optimizations?

David Turner <drakonstein@xxxxxxxxx> · Mon, 20 Aug 2018 16:41:19 -0400

I didn't ask how many PGs per OSD, I asked how large are your PGs in comparison to your OSDs.  For instance my primary data pool in my home cluster has 10914GB of data in it and has 256 PGs.  That means that each PG accounts for 42GB of data.  I'm using 5TB disks in this cluster.  Each PG on an OSD accounts for 0.8% of the total disk capacity on each OSD.  With that % size per PG, I can get a really good balance in that cluster since even if an OSD had 5 more PGs than another, it's only different by 4% total.
Now let's change the amount of PGs in my cluster.  The same amount of data and the same size of disk, but only 32 PGs in the pool.  Each PG in that pool takes up 6.8% of an OSD's available space.  Now even if I only have 2 more PGs on one osd than another, it's off by 13.4%.

On Mon, Aug 20, 2018 at 4:29 PM Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote:

Am 20.08.2018 um 22:13 schrieb David Turner:

> You might just have too much data per PG.  If a single PG can account

> for 4% of your OSD, then 9% difference in used space on your OSDs is

> caused by an OSD having only 2 more PGs than another OSD.  If you do

> have very large PGs, increasing your PG count in those pools should

> improve your data distribution.

4384 pgs

91 osds

3x replication

4384 / 91 * 3 => ~ 150 pgs which seems to be even too much - at least

the doc says around 100 pgs / osd.

> 

> On Mon, Aug 20, 2018 at 3:59 PM Sage Weil <sage@xxxxxxxxxxxx

> <mailto:sage@xxxxxxxxxxxx>> wrote:

> 

>     On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote:

>     > Hello,

>     >

>     > since loic seems to have left ceph development and his wunderful crush

>     > optimization tool isn'T working anymore i'm trying to get a good

>     > distribution with the ceph balancer.

>     >

>     > Sadly it does not work as good as i want.

>     >

>     > # ceph osd df | sort -k8

>     >

>     > show 75 to 83% Usage which is 8% difference which is too much for me.

>     > I'm optimization by bytes.

>     >

>     > # ceph balancer eval

>     > current cluster score 0.005420 (lower is better)

>     >

>     > # ceph balancer eval $OPT_NAME

>     > plan spriebe_2018-08-20_19:36 final score 0.005456 (lower is better)

>     >

>     > I'm unable to optimize further ;-( Is there any chance to optimize

>     > further even in case of more rebelancing?

> 

>     The scoring that the balancer module is doing is currently a hybrid

>     of pg

>     count, bytes, and object count.  Picking a single metric might help

>     a bit

>     (as those 3 things are not always perfectly aligned).

> 

>     s

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com