On Thu, 16 Nov 2017, Rafał Wądołowski wrote: > Sage, > > you write about 'automatic balancer module', what do you mean? Could tell us > more? or paste hyperlinks https://github.com/ceph/ceph/blob/master/src/pybind/mgr/balancer/module.py There will be a blog post as soon as 12.2.2 is out. Basically you can do ceph balancer mode crush-compat ceph balancer on and walk away. sage > > BR, > > Rafał Wądołowski > > > http://cloudferro.com/ <http://cloudferro.com/> > On 16.11.2017 14:08, Sage Weil wrote: > > On Thu, 16 Nov 2017, Pavan Rallabhandi wrote: > > > Had to revive this old thread, had couple of questions. > > > > > > Since `ceph osd reweight-by-utilization` is changing the weights of the > > > OSDs but not the CRUSH weights, is it still not a problem (as the OSD > > > weights would be reset to 1) if those reweighted OSDs go OUT of the > > > cluster and later get marked IN? > > > > > > I thought since OSD weights are not persistent across OUT/IN cycles, it > > > is not a lasting solution to use `ceph osd reweight` or `ceph osd > > > reweight-by-utilization`. > > This was fixed a while go, and a superficial check of the jewel code > > indicates that the in/out values are persistent now. Have you observed > > them getting reset with jewel? > > > > > We are having balancing issues on one of our Jewel clusters and I wanted > > > to understand the pros of using `ceph osd reweight-by-utilization` over > > > `ceph osd crush reweight`. > > Both will get the job done, but I would stick with reweight-by-utilization > > as it keeps the real CRUSH weight matched to the device size. Once you > > move to luminous, it will be an easier transition to the automatic > > balancer module (which handles all of this for you). > > > > sage > > > > > > > Thanks, > > > -Pavan. > > > > > > From: Ceph-large <ceph-large-bounces@xxxxxxxxxxxxxx> on behalf of Dan Van > > > Der Ster <daniel.vanderster@xxxxxxx> > > > Date: Tuesday, 25 April 2017 at 11:36 PM > > > To: Anthony D'Atri <aad@xxxxxxxxxxxxxx> > > > Cc: "ceph-large@xxxxxxxxxxxxxx" <ceph-large@xxxxxxxxxxxxxx> > > > Subject: EXT: Re: Rebalancing > > > > > > We run this continuously -- in a cron every 2 hours -- on all of our > > > clusters: > > > https://github.com/cernceph/ceph-scripts/blob/master/tools/crush-reweight-by-utilization.py > > > It's a misnomer, yes -- because my original plan was indeed to modify > > > CRUSH weights but for some reason which I do not recall, I switch it to > > > modify the reweights. It should be super easy to change the crush weight > > > instead. > > > We run it with params to change weights of only 4 OSDs by 0.01 at a time. > > > This ever so gradually flattens the PG distribution, and is totally > > > transparent latency-wise. > > > BTW, it supports reweighting only below certain CRUSH buckets, which is > > > essential if you have a non-uniform OSD tree. > > > > > > For adding in new hardware, we use this script: > > > https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight > > > New OSDs start with crush weight 0, then we gradually increase the weights > > > 0.01 at a time, all the while watching the number of backfills and cluster > > > latency. > > > The same script is used to gradually drain OSDs down to CRUSH weight 0. > > > We've used that second script to completely replace several petabytes of > > > hardware. > > > > > > Cheers, Dan > > > > > > > > > On 25 Apr 2017, at 08:22, Anthony D'Atri > > > <aad@xxxxxxxxxxxxxx<mailto:aad@xxxxxxxxxxxxxx>> wrote: > > > > > > I read this thread with interest because I’ve been squeezing the OSD > > > distirbution on several clusters mysel while expansion gear is in the > > > pipline, ending up with an ugly mix of both types of reweight as well as > > > temporarily raising the full and backfill full ratios. > > > > > > I’d been contemplating tweaking Dan@CERN’s reweighting script to use CRUSH > > > reweighting instead, and to squeeze from both ends, though I fear it might > > > not be as simple as it sounds prima fascia. > > > > > > > > > Aaron wrote: > > > > > > > > > Should I be expecting it to decide to increase some underutilized osds? > > > > > > > > > The osd reweight mechanism only accomodates an override weight between 0 > > > and 1, thus it can decrease but not increase a given OSD’s fullness. To > > > directly fill up underfull OSD’s it would seem to to need an override > > > weight > 1, which isn’t possible. > > > > > > I haven’t personally experienced it (yet), but from what I read, if > > > override reweighted OSD’s get marked out and back in again, their override > > > will revert to 1. In a case where a cluster is running close to the full > > > ratio, this would *seem* as though a network glitch etc. might result in > > > some OSD’s filling up and hitting the full threshold, which would be bad. > > > > > > Using CRUSH reweight instead would seem to address both of these > > > shortcomings, though it does perturb the arbitrary but useful way that > > > initial CRUSH weights by default reflect the capacity of each OSD. > > > Various references also indicate that the override reweight does not > > > change the weight of buckets above the OSD, but that CRUSH reweight does. > > > I haven’t found any discussion of the ramifications of this, but my inital > > > stab at it would be that when one does the 0-1 override reweight, the > > > “extra’ data is redistributed to OSD’s on the same node. CRUSH > > > reweighting would then seem to pull / push the wad of data being adjusted > > > from / to *other* OSD nodes. Or it could be that I’m out of my Vulcan > > > mind. > > > > > > Thus adjusting the weight of a given OSD affects the fullness of other > > > OSD’s, in ways that would seem to differ depending on which method is > > > used. As I think you implied in one of your messages, sometimes this can > > > result in the fullness of one or more OSD’s climbing relatively sharply, > > > even to a point distinctly above where the previous most-full OSDs were. > > > > > > I lurked in the recent developer’s meeting where strategies for A Better > > > Way in Luminous were discussed. While the plans are exciting and hold > > > promise for uniform and thus greater safe utilization of a cluster’s raw > > > space, I suspect though that between dev/test time and the attrition > > > needed to update running clients, those of us running existing RBD > > > clusters won’t be able to take advantage of them for some time. > > > > > > — Anthony > > > > > > > > > _______________________________________________ > > > Ceph-large mailing list > > > Ceph-large@xxxxxxxxxxxxxx<mailto:Ceph-large@xxxxxxxxxxxxxx> > > > http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com > > > > > > > > > > > > > > > _______________________________________________ > > > Ceph-large mailing list > > > Ceph-large@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com > >
_______________________________________________ Ceph-large mailing list Ceph-large@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com