Re: Need Clarification on Maintenance Shutdown Procedure

Dave Hall <kdhall@xxxxxxxxxxxxxx> · Tue, 2 Mar 2021 07:55:15 -0500

Dave,

Just to be certain of the terminology,

-----
Step before Step 4:  Quiesce client systems using Ceph

Step 4:  Turn off everything that's not a MGR, MON, or OSD.

Step 5:  Turn off OSDs

Step 6:  Turn off MONs

Step 7: Turn off MGRs

If any of the above are running on the the same nodes (i.e. mixed nodes),
use OS capabilities (systemd) to stop and disable so nothing auto-starts
when the hardware is powered back on.
----

Regarding my cluster:  Currently 3 nodes with 10GB front and back networks,
8 x 12 TB HDDs per node with Samsung 1.6TB PCIe NVMe cards.  The NVMe was
provisioned to allow adding 4 more HDDs per node, but the RocksDBs are
proving to be a bit too small.

We will shortly increase to 6 OSD nodes plus 3 separate nodes for MGRs,
MONs, MDSs, RGWs, etc.  We will also add Enterprise M.2 drives to the
original nodes to allow us to increase the size of the caches.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx

On Tue, Mar 2, 2021 at 4:06 AM David Caro <dcaro@xxxxxxxxxxxxx> wrote:

> On 03/01 21:41, Dave Hall wrote:
> > Hello,
> >
> > I've had a look at the instructions for clean shutdown given at
> > https://ceph.io/planet/how-to-do-a-ceph-cluster-maintenance-shutdown/,
> but
> > I'm not clear about some things on the steps about shutting down the
> > various Ceph components.
> >
> > For my current 3-node cluster I have MONs, MDSs, MGRs, and OSDs all
> running
> > on the same nodes.  Also, this is a non-container installation.
> >
> > Since I don't have separate dedicated nodes, as described in the
> referenced
> > web page, I think  the instructions mean that I need to issue SystemD
> > commands to stop the corresponding services/targets on each node for the
> > Ceph components mentioned in each step.
>
> Yep, the systemd units are usually named 'ceph-<daemon>@<id>', for example
> 'ceph-osd@45' would be the systemd unit for osd.45.
>
> >
> > Since we want to bring services up in the right order, I should also use
> > SystemD commands to disable these services/targets so they don't
> > automatically restart when I power the nodes back on.  After power-on, I
> > would then re-enable and manually start services/targets in the order
> > described.
>
> Also yes, and if you use some configuration management or similar that
> might
> bring them up automatically you might want to disable it temporarily too.
>
> >
> > One other specific question:  For step 4 it says to shut down my service
> > nodes.  Does this mean my MDSs?  (I'm not running any Object Gateways or
> > NFS, but I think these would go in this step as well?)
>
> Yes, that is correct. Monitor would be the MONs, and admin the MGRs.
>
> >
> > Please let me know if I've got this right.  The cluster contains 200TB
> of a
> > researcher's data that has taken a year to collect, so caution is needed.
>
> Can you share a bit more about your setup? Are you using replicas? How
> many?
> Erasure coding? (a ceph osd pool ls detail , ceph osd status or similar can
> help too).
>
>
> I would recommend trying to get the hand of the process in a test
> environment
> first.
>
> Cheers!
>
> >
> > Thanks.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdhall@xxxxxxxxxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> --
> David Caro
> SRE - Cloud Services
> Wikimedia Foundation <https://wikimediafoundation.org/>
> PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3
>
> "Imagine a world in which every single human being can freely share in the
> sum of all knowledge. That's our commitment."
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx