Re: Rebalancing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Apr 20, 2017, at 11:27 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:

On Thu, 20 Apr 2017, Aaron Bassett wrote:
Good morning,
I have a large (1000) osd cluster running Jewel (10.2.6). It's an object store cluster, just using RGW with two EC pools of different redundancies. Tunable are optimal:

ceph osd crush show-tunables
{
   "choose_local_tries": 0,
   "choose_local_fallback_tries": 0,
   "choose_total_tries": 50,
   "chooseleaf_descend_once": 1,
   "chooseleaf_vary_r": 1,
   "chooseleaf_stable": 1,
   "straw_calc_version": 1,
   "allowed_bucket_algs": 54,
   "profile": "jewel",
   "optimal_tunables": 1,
   "legacy_tunables": 0,
   "minimum_required_version": "jewel",
   "require_feature_tunables": 1,
   "require_feature_tunables2": 1,
   "has_v2_rules": 1,
   "require_feature_tunables3": 1,
   "has_v3_rules": 0,
   "has_v4_buckets": 0,
   "require_feature_tunables5": 1,
   "has_v5_rules": 0
}


It's about 72% full and I'm starting to hit the dreaded "nearfull" 
warnings. My osd utilizations range from 59% to 85%. My current approach 
has been to use "ceph osd crush reweight" to knock a few points off the 
weight of any osds that are > 84% utilized. I realized I should also 
probably be bumping up the weights of some osds at the low end to help 
direct the data in the right direction, but I have not started doing 
that yet.  It's getting a bit complicated as I'm having some I've 
already weighted down pop back up again, so it takes a lot of care to do 
it right and not screw up in a way that would move a lot of data 
unnecessarily, or get into a backfill_toofull situation.

FWIW, in the past on an older cluster running Hammer I believe, I had 
used rewight_by_utilization in this situation. That ended poorly as it 
lowered some of the weights so low that crush was unable to place some 
pgs leading me to a lengthy process of manually correcting. Also this 
cluster is much larger than that one was and I'm hesitant to try to 
shuffle so much data at once.

That problem has been fixed; I'd try the new jewel version.

This is the out of ceph osd test-reweight-by-utilization:
no change
moved 0 / 278144 (0%)
avg 259.948
stddev 15.9527 -> 15.9527 (expected baseline 16.1154)
min osd.512 with 217 -> 217 pgs (0.834783 -> 0.834783 * mean)
max osd.870 with 314 -> 314 pgs (1.20794 -> 1.20794 * mean)

oload 120
max_change 0.05
max_change_osds 4
average 0.719013
overload 0.862816

...and I'm guessing that this isn't doing anything because the default 
oload value of 120 is too high for you.  Try setting that to 110 and 
re-running test-rewight-by-utilization to see what it will do.

Google is failing me on oload, are there docs you can point me at?


So just wondering if anyone has any advice for me here, or if I should 
carry on as is. I would like to get overall utilization up to at least 
80% before calling it full and moving on to another, as with a cluster 
this size, those last few percent represent quite a lot of space.

Note that in luminous we have a few mechanisms in place that will let you 
get to an essentially perfect distribution (yay, finally!) so this is a 
short-term problem to get through... at least until you can get all 
clients for the cluster using luminous as well.  Since this is an rgw 
cluster that shouldn't be a problem for you!
Thats great to hear, I'm hoping to do the next cluster on Luminous/Bluestore, but its going to depend how long I can keep shoveling data into this one!




sage

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFS]

  Powered by Linux