How to do maintenance without falling out of service?

J David <j.david.lists@xxxxxxxxx> · Mon, 19 Jan 2015 11:40:24 -0500

A couple of weeks ago, we had some involuntary maintenance come up
that required us to briefly turn off one node of a three-node ceph
cluster.

To our surprise, this resulted in failure to write on the VM's on that
ceph cluster, even though we set noout before the maintenance.

This cluster is for bulk storage, it has copies=1 (2 total) and very
large SATA drives.  The OSD tree looks like this:

# id weight type name up/down reweight
-1 127.1 root default
-2 18.16 host f16
0 4.54 osd.0 up 1
1 4.54 osd.1 up 1
2 4.54 osd.2 up 1
3 4.54 osd.3 up 1
-3 54.48 host f17
4 4.54 osd.4 up 1
5 4.54 osd.5 up 1
6 4.54 osd.6 up 1
7 4.54 osd.7 up 1
8 4.54 osd.8 up 1
9 4.54 osd.9 up 1
10 4.54 osd.10 up 1
11 4.54 osd.11 up 1
12 4.54 osd.12 up 1
13 4.54 osd.13 up 1
14 4.54 osd.14 up 1
15 4.54 osd.15 up 1
-4 54.48 host f18
16 4.54 osd.16 up 1
17 4.54 osd.17 up 1
18 4.54 osd.18 up 1
19 4.54 osd.19 up 1
20 4.54 osd.20 up 1
21 4.54 osd.21 up 1
22 4.54 osd.22 up 1
23 4.54 osd.23 up 1
24 4.54 osd.24 up 1
25 4.54 osd.25 up 1
26 4.54 osd.26 up 1
27 4.54 osd.27 up 1

The host that was turned off was f18.  f16 does have a handful of
OSDs, but it is mostly there to provide an odd number of monitors.
The cluster is very lightly used, here is the current status:

    cluster e9c32e63-f3eb-4c25-b172-4815ed566ec7
     health HEALTH_OK
     monmap e3: 3 mons at
{f16=192.168.19.216:6789/0,f17=192.168.19.217:6789/0,f18=192.168.19.218:6789/0},
election epoch 28, quorum 0,1,2 f16,f17,f18
     osdmap e1674: 28 osds: 28 up, 28 in
      pgmap v12965109: 1152 pgs, 3 pools, 11139 GB data, 2784 kobjects
            22314 GB used, 105 TB / 127 TB avail
                1152 active+clean
  client io 38162 B/s wr, 9 op/s

Where did we go wrong last time?  How can we do the same maintenance
to f17 (taking it offline for about 15-30 minutes) without repeating
our mistake?

As it stands, it seems like we have inadvertently created a cluster
with three single points of failure, rather than none.  That has not
been our experience with our other clusters, so we're really confused
at present.

Thanks for any advice!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com