On Thu, 20 Apr 2017, Aaron Bassett wrote: > Good morning, > I have a large (1000) osd cluster running Jewel (10.2.6). It's an object store cluster, just using RGW with two EC pools of different redundancies. Tunable are optimal: > > ceph osd crush show-tunables > { > "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, > "chooseleaf_stable": 1, > "straw_calc_version": 1, > "allowed_bucket_algs": 54, > "profile": "jewel", > "optimal_tunables": 1, > "legacy_tunables": 0, > "minimum_required_version": "jewel", > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "has_v2_rules": 1, > "require_feature_tunables3": 1, > "has_v3_rules": 0, > "has_v4_buckets": 0, > "require_feature_tunables5": 1, > "has_v5_rules": 0 > } > > > It's about 72% full and I'm starting to hit the dreaded "nearfull" > warnings. My osd utilizations range from 59% to 85%. My current approach > has been to use "ceph osd crush reweight" to knock a few points off the > weight of any osds that are > 84% utilized. I realized I should also > probably be bumping up the weights of some osds at the low end to help > direct the data in the right direction, but I have not started doing > that yet. It's getting a bit complicated as I'm having some I've > already weighted down pop back up again, so it takes a lot of care to do > it right and not screw up in a way that would move a lot of data > unnecessarily, or get into a backfill_toofull situation. > > FWIW, in the past on an older cluster running Hammer I believe, I had > used rewight_by_utilization in this situation. That ended poorly as it > lowered some of the weights so low that crush was unable to place some > pgs leading me to a lengthy process of manually correcting. Also this > cluster is much larger than that one was and I'm hesitant to try to > shuffle so much data at once. That problem has been fixed; I'd try the new jewel version. > This is the out of ceph osd test-reweight-by-utilization: > no change > moved 0 / 278144 (0%) > avg 259.948 > stddev 15.9527 -> 15.9527 (expected baseline 16.1154) > min osd.512 with 217 -> 217 pgs (0.834783 -> 0.834783 * mean) > max osd.870 with 314 -> 314 pgs (1.20794 -> 1.20794 * mean) > > oload 120 > max_change 0.05 > max_change_osds 4 > average 0.719013 > overload 0.862816 ...and I'm guessing that this isn't doing anything because the default oload value of 120 is too high for you. Try setting that to 110 and re-running test-rewight-by-utilization to see what it will do. > So just wondering if anyone has any advice for me here, or if I should > carry on as is. I would like to get overall utilization up to at least > 80% before calling it full and moving on to another, as with a cluster > this size, those last few percent represent quite a lot of space. Note that in luminous we have a few mechanisms in place that will let you get to an essentially perfect distribution (yay, finally!) so this is a short-term problem to get through... at least until you can get all clients for the cluster using luminous as well. Since this is an rgw cluster that shouldn't be a problem for you! sage _______________________________________________ Ceph-large mailing list Ceph-large@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com