Hi all,
I had 12 OSD's in my cluster with 2 OSD nodes. One of the OSD was in down state, I have removed that PG from cluster, by removing crush rule for that OSD.
Now cluster with 11 OSD's, started rebalancing. After sometime, cluster status was
ems@rack6-client-5:~$ sudo ceph -s
cluster eb5452f4-5ce9-4b97-9bfd-2a34716855f1
health HEALTH_WARN 1 pgs down; 252 pgs incomplete; 10 pgs peering; 73 pgs stale; 262 pgs stuck inactive; 73 pgs stuck stale; 262 pgs stuck unclean; clock skew detected on mon.rack6-client-5, mon.rack6-client-6
monmap e1: 3 mons at {rack6-client-4=10.242.43.105:6789/0,rack6-client-5=10.242.43.106:6789/0,rack6-client-6=10.242.43.107:6789/0}, election epoch 12, quorum 0,1,2 rack6-client-4,rack6-client-5,rack6-client-6
osdmap e2648: 11 osds: 11 up, 11 in
pgmap v554251: 846 pgs, 3 pools, 4383 GB data, 1095 kobjects
11668 GB used, 26048 GB / 37717 GB avail
63 stale+active+clean
1 down+incomplete
521 active+clean
251 incomplete
10 stale+peering
ems@rack6-client-5:~$
To fix this, i cant run "ceph osd lost <osd.id>" to remove the PG which is in down state. As OSD is already removed from the cluster.
ems@rack6-client-4:~$ sudo ceph pg dump all | grep down
dumped all in format plain
1.38 1548 0 0 0 0 6492782592 3001 3001 down+incomplete 2014-12-18 15:58:29.681708 1118'508438 2648:1073892 [6,3,1] 6 [6,3,1] 6 76'437184 2014-12-16 12:38:35.322835 76'437184 2014-12-16 12:38:35.322835
ems@rack6-client-4:~$
ems@rack6-client-4:~$ sudo ceph pg 1.38 query
.............
"recovery_state": [
{ "name": "Started\/Primary\/Peering",
"enter_time": "2014-12-18 15:58:29.681666",
"past_intervals": [
{ "first": 1109,
"last": 1118,
"maybe_went_rw": 1,
...................
...................
"down_osds_we_would_probe": [
7],
"peering_blocked_by": []},
...................
...................
ems@rack6-client-4:~$ sudo ceph osd tree
# id weight type name up/down reweight
-1 36.85 root default
-2 20.1 host rack2-storage-1
0 3.35 osd.0 up 1
1 3.35 osd.1 up 1
2 3.35 osd.2 up 1
3 3.35 osd.3 up 1
4 3.35 osd.4 up 1
5 3.35 osd.5 up 1
-3 16.75 host rack2-storage-5
6 3.35 osd.6 up 1
8 3.35 osd.8 up 1
9 3.35 osd.9 up 1
10 3.35 osd.10 up 1
11 3.35 osd.11 up 1
ems@rack6-client-4:~$ sudo ceph osd lost 7 --yes-i-really-mean-it
osd.7 is not down or doesn't exist
ems@rack6-client-4:~$
Can somebody suggest any other recovery step to come out of this?
-Thanks & Regards,
Mallikarjun Biradar
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com