I have a cluster running 0.80.9 on Ubuntu 14.04. A couple nights ago I lost two disks from a pool with size=2. :(
I replaced the two failed OSDs and I now have two PGs which are marked as incomplete in an otherwise healthy cluster. Following this page (
https://ceph.com/community/incomplete-pgs-oh-my/ ) I was able to set up another node and install Giant 0.87.1, mount one of my failed OSD drives and successfully export the two PGs. I set up another OSD on my new node, weighted it to zero, and imported the two PGs.
I'm still stuck though. It seems as though the new OSD just doesn't want to share with the other OSDs. Is there any way for me to ask an OSD which PGs it has (rather than ask the MON which OSDs a PG is on) to verify that my import was good? Help!
0 and 15 were the OSDs I lost. 30 is the new OSD. I've currently got size = 2, min_size = 1.
root@storage1:~# ceph pg dump | grep incomplete | column -t
dumped all in format plain
3.102 0 0 0 0 0 0 0 incomplete 2015-04-02 20:49:32.529594 0'0 15730:21 [0,15] 0 [0,15] 0 13985'53107 2015-03-29 21:17:15.568125 13985'49195 2015-03-24 18:38:08.244769
3.c7 0 0 0 0 0 0 0 incomplete 2015-04-02 20:49:32.968841 0'0 15730:17 [15,0] 15 [15,0] 15 13985'54076 2015-03-31 19:14:22.721695 13985'54076 2015-03-31 19:14:22.721695
root@storage1:~# ceph health detail
HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck unclean; 1 requests are blocked > 32 sec; 1 osds have slow requests
pg 3.c7 is stuck inactive since forever, current state incomplete, last acting [15,0]
pg 3.102 is stuck inactive since forever, current state incomplete, last acting [0,15]
pg 3.c7 is stuck unclean since forever, current state incomplete, last acting [15,0]
pg 3.102 is stuck unclean since forever, current state incomplete, last acting [0,15]
pg 3.102 is incomplete, acting [0,15]
pg 3.c7 is incomplete, acting [15,0]
1 ops are blocked > 8388.61 sec
1 ops are blocked > 8388.61 sec on osd.15
1 osds have slow requests
root@storage1:~# ceph osd tree
# id weight type name up/down reweight
-1 81.65 root default
-2 81.65 host storage1
-3 13.63 journal storage1-journal1
1 2.72 osd.1 up 1
4 2.72 osd.4 up 1
2 2.73 osd.2 up 1
3 2.73 osd.3 up 1
0 2.73 osd.0 up 1
-4 13.61 journal storage1-journal2
5 2.72 osd.5 up 1
6 2.72 osd.6 up 1
8 2.72 osd.8 up 1
9 2.72 osd.9 up 1
7 2.73 osd.7 up 1
-5 13.6 journal storage1-journal3
11 2.72 osd.11 up 1
12 2.72 osd.12 up 1
13 2.72 osd.13 up 1
14 2.72 osd.14 up 1
10 2.72 osd.10 up 1
-6 13.61 journal storage1-journal4
16 2.72 osd.16 up 1
17 2.72 osd.17 up 1
18 2.72 osd.18 up 1
19 2.72 osd.19 up 1
15 2.73 osd.15 up 1
-7 13.6 journal storage1-journal5
20 2.72 osd.20 up 1
21 2.72 osd.21 up 1
22 2.72 osd.22 up 1
23 2.72 osd.23 up 1
24 2.72 osd.24 up 1
-8 13.6 journal storage1-journal6
25 2.72 osd.25 up 1
26 2.72 osd.26 up 1
27 2.72 osd.27 up 1
28 2.72 osd.28 up 1
29 2.72 osd.29 up 1
-9 0 host ithome
30 0 osd.30 up 1