I'm still pretty new at troubleshooting Ceph and since no one has responded yet I'll give a stab.
What is the size of your pool?
'ceph osd pool get <pool name> size'
It seems like based on the number of incomplete PGs that it was '1'. I understand that if you are able to bring osd 7 back in, it would clear up. I'm just not seeing a secondary osd for that PG.
Disclaimer: I could be totally wrong.
Robert LeBlanc
On Thu, Dec 18, 2014 at 11:41 PM, Mallikarjun Biradar <mallikarjuna.biradar@xxxxxxxxx> wrote:
Hi all,I had 12 OSD's in my cluster with 2 OSD nodes. One of the OSD was in down state, I have removed that PG from cluster, by removing crush rule for that OSD.Now cluster with 11 OSD's, started rebalancing. After sometime, cluster status wasems@rack6-client-5:~$ sudo ceph -scluster eb5452f4-5ce9-4b97-9bfd-2a34716855f1health HEALTH_WARN 1 pgs down; 252 pgs incomplete; 10 pgs peering; 73 pgs stale; 262 pgs stuck inactive; 73 pgs stuck stale; 262 pgs stuck unclean; clock skew detected on mon.rack6-client-5, mon.rack6-client-6monmap e1: 3 mons at {rack6-client-4=10.242.43.105:6789/0,rack6-client-5=10.242.43.106:6789/0,rack6-client-6=10.242.43.107:6789/0}, election epoch 12, quorum 0,1,2 rack6-client-4,rack6-client-5,rack6-client-6osdmap e2648: 11 osds: 11 up, 11 inpgmap v554251: 846 pgs, 3 pools, 4383 GB data, 1095 kobjects11668 GB used, 26048 GB / 37717 GB avail63 stale+active+clean1 down+incomplete521 active+clean251 incomplete10 stale+peeringems@rack6-client-5:~$To fix this, i cant run "ceph osd lost <osd.id>" to remove the PG which is in down state. As OSD is already removed from the cluster.ems@rack6-client-4:~$ sudo ceph pg dump all | grep downdumped all in format plain1.38 1548 0 0 0 0 6492782592 3001 3001 down+incomplete 2014-12-18 15:58:29.681708 1118'508438 2648:1073892 [6,3,1] 6 [6,3,1] 6 76'437184 2014-12-16 12:38:35.322835 76'437184 2014-12-16 12:38:35.322835ems@rack6-client-4:~$ems@rack6-client-4:~$ sudo ceph pg 1.38 query............."recovery_state": [{ "name": "Started\/Primary\/Peering","enter_time": "2014-12-18 15:58:29.681666","past_intervals": [{ "first": 1109,"last": 1118,"maybe_went_rw": 1,......................................"down_osds_we_would_probe": [7],"peering_blocked_by": []},......................................ems@rack6-client-4:~$ sudo ceph osd tree# id weight type name up/down reweight-1 36.85 root default-2 20.1 host rack2-storage-10 3.35 osd.0 up 11 3.35 osd.1 up 12 3.35 osd.2 up 13 3.35 osd.3 up 14 3.35 osd.4 up 15 3.35 osd.5 up 1-3 16.75 host rack2-storage-56 3.35 osd.6 up 18 3.35 osd.8 up 19 3.35 osd.9 up 110 3.35 osd.10 up 111 3.35 osd.11 up 1ems@rack6-client-4:~$ sudo ceph osd lost 7 --yes-i-really-mean-itosd.7 is not down or doesn't existems@rack6-client-4:~$Can somebody suggest any other recovery step to come out of this?-Thanks & Regards,Mallikarjun Biradar
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com