Rebuilding Cluster from complete MON failure with existing OSDs

Dan Geist <dan@xxxxxxxxxx> · Tue, 6 Jan 2015 15:28:58 -0500 (EST)

Hi, I have a situation where I moved the interfaces over which my ceph-public network is connected (only the interfaces, not the IPs, etc.) this was done to increase available bandwidth, but it backfired catastrophically. My monitors all failed and somehow became corrupted, but I was unable to repair them. So I rebuild the monitors in hope that I could add the existing OSDs back in and recover the cluster.

There are three hosts. Each has a monitor and 6 osds. Each osd is a spinning disk partition with a journal located on a SSD partition on the same host. From what I can tell, all the data on the osd disks is intact, but even after (what I think was) adding all the OSDs back into the crushmap, etc. the cluster doesn't seem like it is "seeing" the partitions and I'm at a loss for how to troubleshoot it further.

Hosts are all Ubutunu trusty running 0.80.7 ceph packages.

dgeist# ceph -s
    cluster ac486394-802a-49d3-a92c-a103268ea189
     health HEALTH_WARN 4288 pgs stuck inactive; 4288 pgs stuck unclean; 18/18 in osds are down
     monmap e1: 3 mons at {hypd01=10.100.100.11:6789/0,hypd02=10.100.100.12:6789/0,hypd03=10.100.100.13:6789/0}, election epoch 40, quorum 0,1,2 hypd01,hypd02,hypd03
     osdmap e65: 18 osds: 0 up, 18 in
      pgmap v66: 4288 pgs, 4 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                4288 creating

dgeist# ceph osd tree
# id	weight	type name	up/down	reweight
-1	18	root default
-2	6		host hypd01
0	1			osd.0	down	1	
1	1			osd.1	down	1	
2	1			osd.2	down	1	
3	1			osd.3	down	1	
4	1			osd.4	down	1	
5	1			osd.5	down	1	
-3	6		host hypd02
6	1			osd.6	down	1	
7	1			osd.7	down	1	
8	1			osd.8	down	1	
9	1			osd.9	down	1	
10	1			osd.10	down	1	
11	1			osd.11	down	1	
-4	6		host hypd03
12	1			osd.12	down	1	
13	1			osd.13	down	1	
14	1			osd.14	down	1	
15	1			osd.15	down	1	
16	1			osd.16	down	1	
17	1			osd.17	down	1

Thanks in advance for any thoughts on how to recover this.

Dan

Dan Geist dan(@)polter.net
(33.942973, -84.312472)
http://www.polter.net

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com