Re: jewel to luminous upgrade, chooseleaf_vary_r and chooseleaf_stable

Adrian <aussieade@xxxxxxxxx> · Wed, 16 May 2018 12:31:18 +1000

Thanks Dan,

After talking it through we've decided to adopt your approach too and leave the tunables till after the upgrade.

Regards,
Adrian.

On Mon, May 14, 2018 at 5:14 PM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
Hi Adrian,

Is there a strict reason why you *must* upgrade the tunables?

It is normally OK to run with old (e.g. hammer) tunables on a luminous

cluster. The crush placement won't be state of the art, but that's not

a huge problem.

We have a lot of data in a jewel cluster with hammer tunables. We'll

upgrade that to luminous soon, but don't plan to set chooseleaf_stable

until there's less disruptive procedure, e.g.  [1].

Cheers, Dan

[1] One idea I had to make this much less disruptive would be to

script something that uses upmap's to lock all PGs into their current

placement, then set chooseleaf_stable, then gradually remove the

upmap's. There are some details to work out, and it requires all

clients to be running luminous, but I think something like this could

help...

On Mon, May 14, 2018 at 9:01 AM, Adrian <aussieade@xxxxxxxxx> wrote:

> Hi all,

>

> We recently upgraded our old ceph cluster to jewel (5xmon, 21xstorage hosts

> with 9x6tb filestore osds and 3xssd's with 3 journals on each) - mostly used

> for openstack compute/cinder.

>

> In order to get there we had to go with chooseleaf_vary_r = 4 in order to

> minimize client impact and save time. We now need to get to luminous (on a

> deadline and time is limited).

>

> Current tunables are:

>   {

>       "choose_local_tries": 0,

>       "choose_local_fallback_tries": 0,

>       "choose_total_tries": 50,

>       "chooseleaf_descend_once": 1,

>       "chooseleaf_vary_r": 4,

>       "chooseleaf_stable": 0,

>       "straw_calc_version": 1,

>       "allowed_bucket_algs": 22,

>       "profile": "unknown",

>       "optimal_tunables": 0,

>       "legacy_tunables": 0,

>       "minimum_required_version": "firefly",

>       "require_feature_tunables": 1,

>       "require_feature_tunables2": 1,

>       "has_v2_rules": 0,

>       "require_feature_tunables3": 1,

>       "has_v3_rules": 0,

>       "has_v4_buckets": 0,

>       "require_feature_tunables5": 0,

>       "has_v5_rules": 0

>   }

>

> Setting chooseleaf_stable to 1, the crush compare tool says:

>    Replacing the crushmap specified with --origin with the crushmap

>   specified with --destination will move 8774 PGs (59.08417508417509% of the

> total)

>   from one item to another.

>

> Current tunings we have in ceph.conf are:

>   #THROTTLING CEPH

>   osd_max_backfills = 1

>   osd_recovery_max_active = 1

>   osd_recovery_op_priority = 1

>   osd_client_op_priority = 63

>

>   #PERFORMANCE TUNING

>   osd_op_threads = 6

>   filestore_op_threads = 10

>   filestore_max_sync_interval = 30

>

> I was wondering if anyone has any advice as to anything else we can do

> balancing client impact and speed of recovery or war stories of other things

> to consider.

>

> I'm also wondering about the interplay between chooseleaf_vary_r and

> chooseleaf_stable.

> Are we better with

> 1) sticking with choosleaf_vary_r = 4, setting chooseleaf_stable =1,

> upgrading and then setting chooseleaf_vary_r incrementally to 1 when more

> time is available

> or

> 2) setting chooseleaf_vary_r incrementally first, then chooseleaf_stable and

> finally upgrade

>

> All this bearing in mind we'd like to keep the time it takes us to get to

> luminous as short as possible ;-) (guestimating a 59% rebalance to take many

> days)

>

> Any advice/thoughts gratefully received.

>

> Regards,

> Adrian.

>

> --

> ---

> Adrian : aussieade@xxxxxxxxx

> If violence doesn't solve your problem, you're not using enough of it.

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

-- 
---
Adrian : aussieade@xxxxxxxxx
If violence doesn't solve your problem, you're not using enough of it.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com