Hi Sage & hzwulibin Many thanks for your reply :) Ceph osd tree : http://pastebin.com/3cC8brcF Crushmap : http://pastebin.com/K2BNSHys Since I removed the 12 osds from the crush map, there are no more stuck PGs . Version : Hammer 0.94.5. Do you think it can just be a reporting issue ? The osds that went down had around 300 PGs mapped to them out of which it was primary for 0 . We have two IDCs. All the primary are in one IDC only for now. Sage, I did not get yor first point. What is not matching ? In our cluster there were 2758 running osds before the 12 osd crash . After that event , there were 2746 osds left. PG Stat for your reference : v3733136: 75680 pgs: 75680 active+clean; 409 GB data, 16745 GB used, 14299 TB / 14315 TB avail; 443 B/s wr, 0 op/s Also, I have seen this issue 3 times with our cluster. Even when 1 osd goes down in IDC1 or IDC2, some PG always remain undersized. I am sure this issue will come up once again . So can you tell me what things should I look when an OSD goes down next time ? When I manually stop an osd daemon in IDC1 or IDC2 by running, /etc/init.d/ceph stop osd.x , the cluster recovers nicely. That is what making the issue so complex. It is a big cluster and we have just started . We will need the community help to make it run smoothly and expand further :) Thanks Gaurav On Thu, May 19, 2016 at 1:17 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Thu, 19 May 2016, Gaurav Bafna wrote: >> Hi Cephers , >> >> In our production cluster at Reliance Jio, when as osd goes corrupt >> and crashes, Cluster remains unhealthy even after 4 hours. >> >> cluster fac04d85-db48-4564-b821-deebda046261 >> health HEALTH_WARN >> 658 pgs degraded >> 658 pgs stuck degraded >> 688 pgs stuck unclean >> 658 pgs stuck undersized >> 658 pgs undersized > ^^^ this... > >> recovery 3064/1981308 objects degraded (0.155%) >> recovery 124/1981308 objects misplaced (0.006%) >> monmap e11: 11 mons at >> {dssmon2=10.140.208.224:6789/0,dssmon3=10.140.208.225:6789/0,dssmon31=10.135.38.141:6789/0,dssmon32=10.135.38.142:6789/0,dssmon33=10.135.38.143:6789/0,dssmon34=10.135.38.144:6789/0,dssmon35=10.135.38.145:6789/0,dssmon4=10.140.208.226:6789/0,dssmon5=10.140.208.227:6789/0,dssmon6=10.140.208.228:6789/0,dssmonleader1=10.140.208.223:6789/0} >> election epoch 792, quorum 0,1,2,3,4,5,6,7,8,9,10 >> dssmon31,dssmon32,dssmon33,dssmon34,dssmon35,dssmonleader1,dssmon2,dssmon3,dssmon4,dssmon5,dssmon6 >> osdmap e8778: 2774 osds: 2746 up, 2746 in; 30 remapped pgs > doesn't match this ^^ > > which makes it look like a problem with OSDs reporting PG state to the > mon. The fact that an OSD restarts supports that theory. > > What version is this? A bunch of the osd -> mon pg reporting code was > recently rewritten (between infernalis and jewel), so the new code is > hopefully more robust. (OTOH, it is also new, so we may have missed > something.) > > Nice big cluster! > > sage -- Gaurav Bafna 9540631400 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html