Re: Storage node refurbishing, a "freeze" OSD feature would be nice

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 31 Aug 2015 09:26:58 +0100



On Mon, Aug 31, 2015 at 5:07 AM, Christian Balzer <chibi@xxxxxxx> wrote:
>
> Hello,
>
> I'm about to add another storage node to small firefly cluster here and
> refurbish 2 existing nodes (more RAM, different OSD disks).
>
> Insert rant about not going to start using ceph-deploy as I would have to
> set the cluster to no-in since "prepare" also activates things due to the
> udev magic...
>
> This cluster is quite at the limits of its IOPS capacity (the HW was
> requested ages ago, but the mills here grind slowly and not particular
> fine either), so the plan is to:
>
> a) phase in the new node (lets call it C), one OSD at a time (in the dead
> of night)
> b) empty out old node A (weight 0), one OSD at a time. When
> done, refurbish and bring it back in, like above.
> c) repeat with 2nd old node B.
>
> Looking at this it's obvious where the big optimization in this procedure
> would be, having the ability to "freeze" the OSDs on node B.
> That is making them ineligible for any new PGs while preserving their
> current status.
> So that data moves from A to C (which is significantly faster than A or B)
> and then back to A when it is refurbished, avoiding any heavy lifting by B.
>
> Does that sound like something other people might find useful as well and
> is it feasible w/o upsetting the CRUSH applecart?

That's the rub, isn't it. Freezing an OSD is implicitly switching from
calculating locations to enumerating them. I can think of the start to
a few hacks around that (mostly around our existing temp pg mappings),
but I don't think it's possible to scale them. :/
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com