Re: Safely Upgrading OS on a live Ceph Cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/27/17 18:01, Heller, Chris wrote:
First I bring down the Ceph FS via `ceph mds cluster_down`.
Second, to prevent OSDs from trying to repair data, I run `ceph osd set noout`
Finally I stop the ceph processes in the following order: ceph-mds, ceph-mon, ceph-osd

This is the wrong procedure. Likely it will just involve more cpu and memory usage on startup, not broken behavior (unless you run out of RAM). After all, it has to recover from power outages, so any order ought to work, just some are better.

I am unsure on the cephfs part... but I would think you have it right, except I wouldn't do `ceph mds cluster_down` (but don't know if it's right to)... maybe try without that. I never used that except when I want to remove all mds nodes and destroy all the cephfs data. And I didn't find any docs on what it really even does, except it won't let you remove all your mds and destroy the cephfs without it.

The correct procedure as far as I know is:

## 1. cluster must be healthy and to set noout, norecover, norebalance, nobackfill
ceph -s
for s in noout norecover norebalance nobackfill; do ceph osd set $s; done

## 2. shut down all OSDs and then the all MONs - not MONs before OSDs
# all nodes
service ceph stop osd

# see that all osds are down
ceph osd tree

# all nodes again
ceph -s
service ceph stop

## 3. start MONs before OSDs.
# This already happens on boot per node, but not cluster wide. But with the flags set, it likely doesn't matter. It seems unnecessary on a small cluster.

## 4. unset the flags
# see that all osds are up
ceph -s
ceph osd tree
for s in noout norecover norebalance nobackfill; do ceph osd unset $s; done


Note my cluster has 1 mds and 1 mon, and 7 osd.

I then install the new OS and then bring the cluster back up by walking the steps in reverse:

First I start the ceph processes in the following order: ceph-osd, ceph-mon, ceph-mds
Second I restore OSD functionality with `ceph osd unset noout`
Finally I bring up the Ceph FS via `ceph mds cluster_up`

adjust those steps too... mons start first

Everything works smoothly except the Ceph FS bring up.[...snip...]

How can I safely stop a Ceph cluster, so that it will cleanly start back up again?

Don't know about the cephfs problem... all I can say is try the right general procedure and see if the result changes.

(and I'd love to cite a source on why that's the right procedure and yours isn't, but don't know what to cite... for example http://docs.ceph.com/docs/jewel/rados/operations/operating/#id8 says to use -a in the arguments, but doesn't say whether that's systemd or not, or what it does exactly. I have only seen it discussed a few places, like the mailing list and IRC)
-Chris



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux