On Thu, 20 Apr 2017, Aaron Bassett wrote: > Ahh nm I got it: ceph osd test-reweight-by-utilization 110 > no change > moved 56 / 278144 (0.0201335%) > avg 259.948 > stddev 15.9527 -> 15.9079 (expected baseline 16.1154) > min osd.512 with 217 -> 217 pgs (0.834783 -> 0.834783 * mean) > max osd.870 with 314 -> 314 pgs (1.20794 -> 1.20794 * mean) > > oload 110 > max_change 0.05 > max_change_osds 4 > average 0.719019 > overload 0.790921 > osd.1038 weight 1.000000 -> 0.950012 > osd.10 weight 1.000000 -> 0.950012 > osd.481 weight 1.000000 -> 0.950012 > osd.613 weight 1.000000 -> 0.950012 You might try walking down from 120 to 110, and changing more than 4 osds at a time. > This is only changing the ephemeral weight? Is that going to be an issue if > I need to apply an update and restart osds? This is changing the confusingly-named 'osd reweight' value, which is designed to do exactly this. It won't get clobbered by an osd restart. sage > Aaron > > On Apr 20, 2017, at 11:35 AM, Aaron Bassett > <Aaron.Bassett@xxxxxxxxxxxxx> wrote: > > > On Apr 20, 2017, at 11:27 AM, Sage Weil > <sage@xxxxxxxxxxxx> wrote: > > On Thu, 20 Apr 2017, Aaron Bassett wrote: > Good morning, > I have a large (1000) osd cluster running Jewel > (10.2.6). It's an object store cluster, just using > RGW with two EC pools of different redundancies. > Tunable are optimal: > > ceph osd crush show-tunables > { > "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, > "chooseleaf_stable": 1, > "straw_calc_version": 1, > "allowed_bucket_algs": 54, > "profile": "jewel", > "optimal_tunables": 1, > "legacy_tunables": 0, > "minimum_required_version": "jewel", > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "has_v2_rules": 1, > "require_feature_tunables3": 1, > "has_v3_rules": 0, > "has_v4_buckets": 0, > "require_feature_tunables5": 1, > "has_v5_rules": 0 > } > > > It's about 72% full and I'm starting to hit the > dreaded "nearfull" > warnings. My osd utilizations range from 59% to 85%. > My current approach > has been to use "ceph osd crush reweight" to knock a > few points off the > weight of any osds that are > 84% utilized. I > realized I should also > probably be bumping up the weights of some osds at > the low end to help > direct the data in the right direction, but I have > not started doing > that yet. It's getting a bit complicated as I'm > having some I've > already weighted down pop back up again, so it takes > a lot of care to do > it right and not screw up in a way that would move a > lot of data > unnecessarily, or get into a backfill_toofull > situation. > > FWIW, in the past on an older cluster running Hammer > I believe, I had > used rewight_by_utilization in this situation. That > ended poorly as it > lowered some of the weights so low that crush was > unable to place some > pgs leading me to a lengthy process of manually > correcting. Also this > cluster is much larger than that one was and I'm > hesitant to try to > shuffle so much data at once. > > > That problem has been fixed; I'd try the new jewel version. > > This is the out of ceph osd > test-reweight-by-utilization: > no change > moved 0 / 278144 (0%) > avg 259.948 > stddev 15.9527 -> 15.9527 (expected baseline > 16.1154) > min osd.512 with 217 -> 217 pgs (0.834783 -> > 0.834783 * mean) > max osd.870 with 314 -> 314 pgs (1.20794 -> 1.20794 > * mean) > > oload 120 > max_change 0.05 > max_change_osds 4 > average 0.719013 > overload 0.862816 > > > ...and I'm guessing that this isn't doing anything because the > default > oload value of 120 is too high for you. Try setting that to 110 > and > re-running test-rewight-by-utilization to see what it will do. > > > Google is failing me on oload, are there docs you can point me at? > > > So just wondering if anyone has any advice for > me here, or if I should > carry on as is. I would like to get overall > utilization up to at least > 80% before calling it full and moving on to > another, as with a cluster > this size, those last few percent represent > quite a lot of space. > > > Note that in luminous we have a few mechanisms in place > that will let you > get to an essentially perfect distribution (yay, finally!) > so this is a > short-term problem to get through... at least until you > can get all > clients for the cluster using luminous as well. Since > this is an rgw > cluster that shouldn't be a problem for you! > > Thats great to hear, I'm hoping to do the next cluster on > Luminous/Bluestore, but its going to depend how long I can keep > shoveling data into this one! > > > > > sage > > > CONFIDENTIALITY NOTICE > This e-mail message and any attachments are only for the use > of the intended recipient and may contain information that is > privileged, confidential or exempt from disclosure under > applicable law. If you are not the intended recipient, any > disclosure, distribution or other use of this e-mail message > or attachments is prohibited. If you have received this e-mail > message in error, please delete and notify the sender > immediately. Thank you. > > _______________________________________________ > Ceph-large mailing list > Ceph-large@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com > > > >
_______________________________________________ Ceph-large mailing list Ceph-large@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com