Re: Storage node refurbishing, a "freeze" OSD feature would be nice

Christian Balzer <chibi@xxxxxxx> · Tue, 1 Sep 2015 08:48:59 +0900

Hello,

On Mon, 31 Aug 2015 22:44:05 +0000 Stillwell, Bryan wrote:

> We have the following in our ceph.conf to bring in new OSDs with a weight
> of 0:
> 
> [osd]
> osd_crush_initial_weight = 0
> 
> 
> We then set 'nobackfill' and bring in each OSD at full weight one at a
> time (letting things settle down before bring in the next OSD).  Once all
> the OSDs are brought in we unset 'nobackfill' and let ceph take care of
> the rest.  This seems to work pretty well for us.
> 
That looks interesting, will give it a spin on my test cluster.

One thing the "letting things settle down" reminded me of is that adding
OSDs and especially a new node will cause (potentially significant) data
movement resulting from CRUSH map changes, something to keep in mind when
scheduling even those "harmless" first steps.

Christian

> Bryan
> 
> On 8/31/15, 4:08 PM, "ceph-users on behalf of Wang, Warren"
> <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of
> Warren_Wang@xxxxxxxxxxxxxxxxx> wrote:
> 
> >When we know we need to off a node, we weight it down over time.
> >Depending on your cluster, you may need to do this over days or hours.
> >
> >In theory, you could do the same when putting OSDs in, by setting noin,
> >and then setting weight to something very low, and going up over time. I
> >haven¹t tried this though.
> >
> >--
> >Warren Wang
> >Comcast Cloud (OpenStack)
> >
> >
> >
> >On 8/31/15, 2:57 AM, "ceph-users on behalf of Udo Lembke"
> ><ceph-users-bounces@xxxxxxxxxxxxxx on behalf of ulembke@xxxxxxxxxxxx>
> >wrote:
> >
> >>Hi Christian,
> >>for my setup "b" takes too long - too much data movement and stress to
> >>all nodes.
> >>I have simply (with replica 3) "set noout", reinstall one node (with
> >>new filesystem on the OSDs, but leave them in the
> >>crushmap) and start all OSDs (at friday night) - takes app. less than
> >>one day for rebuild (11*4TB 1*8TB).
> >>Do also stress the other nodes, but less than with weigting to zero.
> >>
> >>Udo
> >>
> >>On 31.08.2015 06:07, Christian Balzer wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm about to add another storage node to small firefly cluster here
> >>> and refurbish 2 existing nodes (more RAM, different OSD disks).
> >>>
> >>> Insert rant about not going to start using ceph-deploy as I would
> >>> have
> >>>to
> >>> set the cluster to no-in since "prepare" also activates things due to
> >>>the
> >>> udev magic...
> >>>
> >>> This cluster is quite at the limits of its IOPS capacity (the HW was
> >>> requested ages ago, but the mills here grind slowly and not
> >>> particular fine either), so the plan is to:
> >>>
> >>> a) phase in the new node (lets call it C), one OSD at a time (in the
> >>>dead
> >>> of night)
> >>> b) empty out old node A (weight 0), one OSD at a time. When
> >>> done, refurbish and bring it back in, like above.
> >>> c) repeat with 2nd old node B.
> >>>
> >>> Looking at this it's obvious where the big optimization in this
> >>>procedure
> >>> would be, having the ability to "freeze" the OSDs on node B.
> >>> That is making them ineligible for any new PGs while preserving their
> >>> current status.
> >>> So that data moves from A to C (which is significantly faster than A
> >>> or
> >>>B)
> >>> and then back to A when it is refurbished, avoiding any heavy lifting
> >>>by B.
> >>>
> >>> Does that sound like something other people might find useful as well
> >>>and
> >>> is it feasible w/o upsetting the CRUSH applecart?
> >>>
> >>> Christian
> >>>
> >>
> >>_______________________________________________
> >>ceph-users mailing list
> >>ceph-users@xxxxxxxxxxxxxx
> >>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >_______________________________________________
> >ceph-users mailing list
> >ceph-users@xxxxxxxxxxxxxx
> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ________________________________
> 
> This E-mail and any of its attachments may contain Time Warner Cable
> proprietary information, which is privileged, confidential, or subject
> to copyright belonging to Time Warner Cable. This E-mail is intended
> solely for the use of the individual or entity to which it is addressed.
> If you are not the intended recipient of this E-mail, you are hereby
> notified that any dissemination, distribution, copying, or action taken
> in relation to the contents of and attachments to this E-mail is
> strictly prohibited and may be unlawful. If you have received this
> E-mail in error, please notify the sender immediately and permanently
> delete the original and any copy of this E-mail and any printout.

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com