Hi all,
We recently upgraded our old ceph cluster to jewel (5xmon, 21xstorage hosts with 9x6tb filestore osds and 3xssd's with 3 journals on each) - mostly used for openstack compute/cinder.In order to get there we had to go with chooseleaf_vary_r = 4 in order to minimize client impact and save time. We now need to get to luminous (on a deadline and time is limited).
Current tunables are:
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 4,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 22,
"profile": "unknown",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "firefly",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 0,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}
Setting chooseleaf_stable to 1, the crush compare tool says:{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 4,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 22,
"profile": "unknown",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "firefly",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 0,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}
Replacing the crushmap specified with --origin with the crushmap
specified with --destination will move 8774 PGs (59.08417508417509% of the total)
from one item to another.
Current tunings we have in ceph.conf are:
#THROTTLING CEPH
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
osd_client_op_priority = 63
#PERFORMANCE TUNING
osd_op_threads = 6
filestore_op_threads = 10
filestore_max_sync_interval = 30
#THROTTLING CEPH
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
osd_client_op_priority = 63
#PERFORMANCE TUNING
osd_op_threads = 6
filestore_op_threads = 10
filestore_max_sync_interval = 30
I was wondering if anyone has any advice as to anything else we can do balancing client impact and speed of recovery or war stories of other things to consider.
Are we better with
1) sticking with choosleaf_vary_r = 4, setting chooseleaf_stable =1, upgrading and then setting chooseleaf_vary_r incrementally to 1 when more time is available
or
2) setting chooseleaf_vary_r incrementally first, then chooseleaf_stable and finally upgrade
All this bearing in mind we'd like to keep the time it takes us to get to luminous as short as possible ;-) (guestimating a 59% rebalance to take many days)
Any advice/thoughts gratefully received.
Regards,
Adrian.
--
---
Adrian : aussieade@xxxxxxxxx
If violence doesn't solve your problem, you're not using enough of it.
Adrian : aussieade@xxxxxxxxx
If violence doesn't solve your problem, you're not using enough of it.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com