Dave, Just to be certain of the terminology, ----- Step before Step 4: Quiesce client systems using Ceph Step 4: Turn off everything that's not a MGR, MON, or OSD. Step 5: Turn off OSDs Step 6: Turn off MONs Step 7: Turn off MGRs If any of the above are running on the the same nodes (i.e. mixed nodes), use OS capabilities (systemd) to stop and disable so nothing auto-starts when the hardware is powered back on. ---- Regarding my cluster: Currently 3 nodes with 10GB front and back networks, 8 x 12 TB HDDs per node with Samsung 1.6TB PCIe NVMe cards. The NVMe was provisioned to allow adding 4 more HDDs per node, but the RocksDBs are proving to be a bit too small. We will shortly increase to 6 OSD nodes plus 3 separate nodes for MGRs, MONs, MDSs, RGWs, etc. We will also add Enterprise M.2 drives to the original nodes to allow us to increase the size of the caches. -Dave -- Dave Hall Binghamton University kdhall@xxxxxxxxxxxxxx On Tue, Mar 2, 2021 at 4:06 AM David Caro <dcaro@xxxxxxxxxxxxx> wrote: > On 03/01 21:41, Dave Hall wrote: > > Hello, > > > > I've had a look at the instructions for clean shutdown given at > > https://ceph.io/planet/how-to-do-a-ceph-cluster-maintenance-shutdown/, > but > > I'm not clear about some things on the steps about shutting down the > > various Ceph components. > > > > For my current 3-node cluster I have MONs, MDSs, MGRs, and OSDs all > running > > on the same nodes. Also, this is a non-container installation. > > > > Since I don't have separate dedicated nodes, as described in the > referenced > > web page, I think the instructions mean that I need to issue SystemD > > commands to stop the corresponding services/targets on each node for the > > Ceph components mentioned in each step. > > Yep, the systemd units are usually named 'ceph-<daemon>@<id>', for example > 'ceph-osd@45' would be the systemd unit for osd.45. > > > > > Since we want to bring services up in the right order, I should also use > > SystemD commands to disable these services/targets so they don't > > automatically restart when I power the nodes back on. After power-on, I > > would then re-enable and manually start services/targets in the order > > described. > > Also yes, and if you use some configuration management or similar that > might > bring them up automatically you might want to disable it temporarily too. > > > > > One other specific question: For step 4 it says to shut down my service > > nodes. Does this mean my MDSs? (I'm not running any Object Gateways or > > NFS, but I think these would go in this step as well?) > > Yes, that is correct. Monitor would be the MONs, and admin the MGRs. > > > > > Please let me know if I've got this right. The cluster contains 200TB > of a > > researcher's data that has taken a year to collect, so caution is needed. > > Can you share a bit more about your setup? Are you using replicas? How > many? > Erasure coding? (a ceph osd pool ls detail , ceph osd status or similar can > help too). > > > I would recommend trying to get the hand of the process in a test > environment > first. > > Cheers! > > > > > Thanks. > > > > -Dave > > > > -- > > Dave Hall > > Binghamton University > > kdhall@xxxxxxxxxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > -- > David Caro > SRE - Cloud Services > Wikimedia Foundation <https://wikimediafoundation.org/> > PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3 > > "Imagine a world in which every single human being can freely share in the > sum of all knowledge. That's our commitment." > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx