Re: balancer mgr module

Caspar Smit <casparsmit@xxxxxxxxxxx> · Fri, 16 Feb 2018 10:44:04 +0100

2018-02-16 10:16 GMT+01:00 Dan van der Ster <dan@xxxxxxxxxxxxxx>:
Hi Caspar,

I've been trying the mgr balancer for a couple weeks now and can share

some experience.

Currently there are two modes implemented: upmap and crush-compat.

Upmap requires all clients to be running luminous -- it uses this new

pg-upmap mechanism to precisely move PGs one by one to a more balanced

layout.

The upmap mode is working only with num PGs, AFAICT, and on at least

one of our clusters it happens to be moving PGs in a pool with no data

-- useless. Checking the implementation, it should be upmapping PGs

from a random pool each iteration -- I have a tracker open for this:

http://tracker.ceph.com/issues/22431

Upmap is the future, but for now I'm trying to exercise the

crush-compat mode on some larger clusters. It's still early days, but

in general it seems to be working in the right direction.

crush-compat does two things: it creates a new "compat" crush

weight-set to give underutilized OSDs more crush weight; and second,

it phases out the osd reweights back to 1.0. So, if you have a cluster

that was previously balanced with ceph osd reweight-by-*, then

crush-compat will gently bring you to the new balancing strategy.

There have been a few issues spotted in 12.2.2... some of the balancer

config-key settings aren't cast properly to int/float so they can

break the balancer; and more importantly the mgr doesn't refresh

config-keys if they change. So if you do change the configuration, you

need to ceph mgr fail <theactiveone> to force the next mgr to reload

the config.

My current config is:

ceph config-key dump

{

    "mgr/balancer/active": "1",

    "mgr/balancer/begin_time": "0830",

    "mgr/balancer/end_time": "1600",

    "mgr/balancer/max_misplaced": "0.01",

    "mgr/balancer/mode": "crush-compat"

}

Note that the begin_time/end_time seem to be in UTC, not the local time zone.

max_displaced defaults to 0.05, and this is used to limit the

percentage of PGs/objects to be rebalanced each iteration.

I have it enabled (ceph balancer on) which means it tries to balance

every 60s. It will skip an iteration if num misplaced is greater than

> max_misplaced, or if any objects are degraded.

When you're first trying the balancer you should do two things to test

a one-off balancing (rather than the always on mode that I use):

  - set debug_mgr=4/5 # then you can tail -f ceph-mgr.*.log | grep

balancer  to see what it's doing

  - ceph balancer mode crush-compat

  - ceph balancer eval # to check the current score

  - ceph balancer optimize myplan # create but do not execute a new plan

  - ceph balancer eval myplan # check what would be the new score

after myplan. Is it getting closer to the optimal value 0?

  - ceph balancer show myplan # study what it's trying to do

  - ceph balancer execute myplan # execute the plan. data movement starts here!

  - ceph balancer reset # we do this because balancer rm is broken,

and myplan isn't removed automatically after execution

v12.2.3 has quite a few balancer fixes, and also adds a pool-specific

balancing (which should hopefully fix my upmap issue).

Hope that helps!

It sure does Dan! Thank you very much for your detailed answer.

I will start testing the balancer module with our demo cluster.

Caspar

Dan

On Fri, Feb 16, 2018 at 9:22 AM, Caspar Smit <casparsmit@xxxxxxxxxxx> wrote:

> Hi,

>

> After watching Sage's talk at LinuxConfAU about making distributed storage

> easy he mentioned the Balancer Manager module. After enabling this module,

> pg's should get balanced automagically around the cluster.

>

> The module was added in Ceph Luminous v12.2.2

>

> Since i couldn't find much documentation about this module i was wondering

> if it is considered stable? (production ready) or still experimental/WIP.

>

> Here's the original mailinglist post describing the module:

>

> https://www.spinics.net/lists/ceph-devel/msg37730.html

>

> A few questions:

>

> What are the differences between the different optimization modes?

> Is the balancer run at certain intervals, if yes, what is the interval?

> Will this trigger continuous backfillling/recovering of pg's when a cluster

> is mostly under write load?

>

> Kind regards,

> Caspar

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com