Re: maintenance questions

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 7 Oct 2016 14:37:10 -0600

On Fri, Oct 7, 2016 at 1:21 PM, Jeff Applewhite <japplewh@xxxxxxxxxx> wrote:
> Hi All
>
> I have a few questions pertaining to management of MONs and OSDs. This is in
> a Ceph 2.x context only.

You mean Jewel? ;)

> -----------------------------------------------
> 1) Can MONs be placed in something resembling maintenance mode (for firmware
> updates, patch reboots, etc.). If so how? If not how addressed?
>
> 2) Can OSDs be placed in something resembling maintenance mode (for firmware
> updates, patch reboots, etc.). If so how? If not how addressed?

In both of these cases, you just turn it off. Preferably politely (ie,
software shutdown) so that the node can report to the cluster it won't
be available. But it's the same as any other failure case from Ceph's
perspective: the node is unavailable for service.

See http://docs.ceph.com/docs/master/install/upgrading-ceph, which is
a little old now but illustrates the basic ideas.

>
> 3) Can MONs be "replaced/migrated" efficiently in a hardware upgrade
> scenario? If so how? If not how addressed?

Monitors can be moved if they can keep the same IP; otherwise you need
to go through some shenanigans:
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address

Or you can add the new location and remove the old location (hopefully
in that order, to maintain your durability, but you could do it the
other way around if reeeeally necessary):
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/

>
> 4) Can OSDs be "replaced/migrated" efficiently in a hardware upgrade
> scenario? If so how? If not how addressed?

You can move an OSD around as long as you either flush its journal or
the journal device is colocated or moved with it. But by default it
will then update to a new CRUSH location and all the data will
reshuffle anyway.

You can also mark an OSD out while keeping it up and the cluster will
then backfill all its data the correct new locations without ever
reducing redundancy.
(http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/)
This gets into the typical concerns about changing CRUSH weights and
migrating data unnecessarily if you aren't removing the whole
host/rack/whatever, but it sounds like are only interested in
wholesale replacement.

It's also possible to clone the drive or whatever and just stick it in
place, in which case Ceph doesn't really notice (maybe modulo some of
the install or admin stuff on the local node, but the cluster doesn't
care).

I've included some links directly relevant throughout but there's
plenty of other info in the docs and you probably want to spend some
time reading them carefully if you're planning to build a management
tool. :)
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com