Re: jewel to luminous upgrade, chooseleaf_vary_r and chooseleaf_stable

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Mon, 14 May 2018 09:14:06 +0200

Hi Adrian,

Is there a strict reason why you *must* upgrade the tunables?

It is normally OK to run with old (e.g. hammer) tunables on a luminous
cluster. The crush placement won't be state of the art, but that's not
a huge problem.

We have a lot of data in a jewel cluster with hammer tunables. We'll
upgrade that to luminous soon, but don't plan to set chooseleaf_stable
until there's less disruptive procedure, e.g.  [1].

Cheers, Dan

[1] One idea I had to make this much less disruptive would be to
script something that uses upmap's to lock all PGs into their current
placement, then set chooseleaf_stable, then gradually remove the
upmap's. There are some details to work out, and it requires all
clients to be running luminous, but I think something like this could
help...

On Mon, May 14, 2018 at 9:01 AM, Adrian <aussieade@xxxxxxxxx> wrote:
> Hi all,
>
> We recently upgraded our old ceph cluster to jewel (5xmon, 21xstorage hosts
> with 9x6tb filestore osds and 3xssd's with 3 journals on each) - mostly used
> for openstack compute/cinder.
>
> In order to get there we had to go with chooseleaf_vary_r = 4 in order to
> minimize client impact and save time. We now need to get to luminous (on a
> deadline and time is limited).
>
> Current tunables are:
>   {
>       "choose_local_tries": 0,
>       "choose_local_fallback_tries": 0,
>       "choose_total_tries": 50,
>       "chooseleaf_descend_once": 1,
>       "chooseleaf_vary_r": 4,
>       "chooseleaf_stable": 0,
>       "straw_calc_version": 1,
>       "allowed_bucket_algs": 22,
>       "profile": "unknown",
>       "optimal_tunables": 0,
>       "legacy_tunables": 0,
>       "minimum_required_version": "firefly",
>       "require_feature_tunables": 1,
>       "require_feature_tunables2": 1,
>       "has_v2_rules": 0,
>       "require_feature_tunables3": 1,
>       "has_v3_rules": 0,
>       "has_v4_buckets": 0,
>       "require_feature_tunables5": 0,
>       "has_v5_rules": 0
>   }
>
> Setting chooseleaf_stable to 1, the crush compare tool says:
>    Replacing the crushmap specified with --origin with the crushmap
>   specified with --destination will move 8774 PGs (59.08417508417509% of the
> total)
>   from one item to another.
>
> Current tunings we have in ceph.conf are:
>   #THROTTLING CEPH
>   osd_max_backfills = 1
>   osd_recovery_max_active = 1
>   osd_recovery_op_priority = 1
>   osd_client_op_priority = 63
>
>   #PERFORMANCE TUNING
>   osd_op_threads = 6
>   filestore_op_threads = 10
>   filestore_max_sync_interval = 30
>
> I was wondering if anyone has any advice as to anything else we can do
> balancing client impact and speed of recovery or war stories of other things
> to consider.
>
> I'm also wondering about the interplay between chooseleaf_vary_r and
> chooseleaf_stable.
> Are we better with
> 1) sticking with choosleaf_vary_r = 4, setting chooseleaf_stable =1,
> upgrading and then setting chooseleaf_vary_r incrementally to 1 when more
> time is available
> or
> 2) setting chooseleaf_vary_r incrementally first, then chooseleaf_stable and
> finally upgrade
>
> All this bearing in mind we'd like to keep the time it takes us to get to
> luminous as short as possible ;-) (guestimating a 59% rebalance to take many
> days)
>
> Any advice/thoughts gratefully received.
>
> Regards,
> Adrian.
>
> --
> ---
> Adrian : aussieade@xxxxxxxxx
> If violence doesn't solve your problem, you're not using enough of it.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com