Re: Rebalancing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



As a follow up, David has been very helpfully showing me how to use his script and I've implemented a crush map it output, resulting in shifting about 20% of the data. It's been recovering for about 4 days now and is nearly halfway done, including losing an OSD along the way.

Also I came across this lib this morning: http://libcrush.org/main/python-crush which I wonder if it could be used to build some tooling for situations like this. 

Aaron 

On Apr 25, 2017, at 2:22 AM, Anthony D'Atri <aad@xxxxxxxxxxxxxx> wrote:

I read this thread with interest because I’ve been squeezing the OSD distirbution on several clusters mysel while expansion gear is in the pipline, ending up with an ugly mix of both types of reweight as well as temporarily raising the full and backfill full ratios.  

I’d been contemplating tweaking Dan@CERN’s reweighting script to use CRUSH reweighting instead, and to squeeze from both ends, though I fear it might not be as simple as it sounds prima fascia.


Aaron wrote:

Should I be expecting it to decide to increase some underutilized osds?


The osd reweight mechanism only accomodates an override weight between 0 and 1, thus it can decrease but not increase a given OSD’s fullness.  To directly fill up underfull OSD’s it would seem to to need an override weight > 1, which isn’t possible.

I haven’t personally experienced it (yet), but from what I read, if override reweighted OSD’s get marked out and back in again, their override will revert to 1.  In a case where a cluster is running close to the full ratio, this would *seem* as though a network glitch etc. might result in some OSD’s filling up and hitting the full threshold, which would be bad.

Using CRUSH reweight instead would seem to address both of these shortcomings, though it does perturb the arbitrary but useful way that initial CRUSH weights by default reflect the capacity of each OSD.  Various references  also indicate that the override reweight does not change the weight of buckets above the OSD, but that CRUSH reweight does.  I haven’t found any discussion of the ramifications of this, but my inital stab at it would be that when one does the 0-1 override reweight, the “extra’ data is redistributed to OSD’s on the same node.  CRUSH reweighting would then seem to pull / push the wad of data being adjusted from / to *other* OSD nodes.  Or it could be that I’m out of my Vulcan mind.

Thus adjusting the weight of a given OSD affects the fullness of other OSD’s, in ways that would seem to differ depending on which method is used.  As I think you implied in one of your messages, sometimes this can result in the fullness of one or more OSD’s climbing relatively sharply, even to a point distinctly above where the previous most-full OSDs were.

I lurked in the recent developer’s meeting where strategies for A Better Way in Luminous were discussed.  While the plans are exciting and hold promise for uniform and thus greater safe utilization of a cluster’s raw space, I suspect though that between dev/test time and the attrition needed to update running clients, those of us running existing RBD clusters won’t be able to take advantage of them for some time.

— Anthony


_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dlarge-2Dceph.com&d=DwIGaQ&c=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs&r=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw&m=2olqRcWniwLlQqMGmuX5F_c2k3DYbr2xv70vWmwW1oM&s=4Aw1kYcIc1Cqaz8eOrb833aayZP2SVVKT9MfOlMppPc&e=

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFS]

  Powered by Linux