We have the following in our ceph.conf to bring in new OSDs with a weight of 0: [osd] osd_crush_initial_weight = 0 We then set 'nobackfill' and bring in each OSD at full weight one at a time (letting things settle down before bring in the next OSD). Once all the OSDs are brought in we unset 'nobackfill' and let ceph take care of the rest. This seems to work pretty well for us. Bryan On 8/31/15, 4:08 PM, "ceph-users on behalf of Wang, Warren" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of Warren_Wang@xxxxxxxxxxxxxxxxx> wrote: >When we know we need to off a node, we weight it down over time. Depending >on your cluster, you may need to do this over days or hours. > >In theory, you could do the same when putting OSDs in, by setting noin, >and then setting weight to something very low, and going up over time. I >haven¹t tried this though. > >-- >Warren Wang >Comcast Cloud (OpenStack) > > > >On 8/31/15, 2:57 AM, "ceph-users on behalf of Udo Lembke" ><ceph-users-bounces@xxxxxxxxxxxxxx on behalf of ulembke@xxxxxxxxxxxx> >wrote: > >>Hi Christian, >>for my setup "b" takes too long - too much data movement and stress to >>all nodes. >>I have simply (with replica 3) "set noout", reinstall one node (with new >>filesystem on the OSDs, but leave them in the >>crushmap) and start all OSDs (at friday night) - takes app. less than one >>day for rebuild (11*4TB 1*8TB). >>Do also stress the other nodes, but less than with weigting to zero. >> >>Udo >> >>On 31.08.2015 06:07, Christian Balzer wrote: >>> >>> Hello, >>> >>> I'm about to add another storage node to small firefly cluster here and >>> refurbish 2 existing nodes (more RAM, different OSD disks). >>> >>> Insert rant about not going to start using ceph-deploy as I would have >>>to >>> set the cluster to no-in since "prepare" also activates things due to >>>the >>> udev magic... >>> >>> This cluster is quite at the limits of its IOPS capacity (the HW was >>> requested ages ago, but the mills here grind slowly and not particular >>> fine either), so the plan is to: >>> >>> a) phase in the new node (lets call it C), one OSD at a time (in the >>>dead >>> of night) >>> b) empty out old node A (weight 0), one OSD at a time. When >>> done, refurbish and bring it back in, like above. >>> c) repeat with 2nd old node B. >>> >>> Looking at this it's obvious where the big optimization in this >>>procedure >>> would be, having the ability to "freeze" the OSDs on node B. >>> That is making them ineligible for any new PGs while preserving their >>> current status. >>> So that data moves from A to C (which is significantly faster than A or >>>B) >>> and then back to A when it is refurbished, avoiding any heavy lifting >>>by B. >>> >>> Does that sound like something other people might find useful as well >>>and >>> is it feasible w/o upsetting the CRUSH applecart? >>> >>> Christian >>> >> >>_______________________________________________ >>ceph-users mailing list >>ceph-users@xxxxxxxxxxxxxx >>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ________________________________ This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com