On Thu, May 19, 2016 at 3:34 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Thu, 19 May 2016, Gaurav Bafna wrote: >> Hi Sage & hzwulibin >> >> Many thanks for your reply :) >> >> Ceph osd tree : http://pastebin.com/3cC8brcF >> >> Crushmap : http://pastebin.com/K2BNSHys > > Can you paste the output from > > ceph osd crush show-tunables { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, "chooseleaf_vary_r": 0, "straw_calc_version": 1, "allowed_bucket_algs": 22, "profile": "unknown", "optimal_tunables": 0, "legacy_tunables": 0, "require_feature_tunables": 1, "require_feature_tunables2": 1, "require_feature_tunables3": 0, "has_v2_rules": 0, "has_v3_rules": 0, "has_v4_buckets": 0 } > >> Since I removed the 12 osds from the crush map, there are no more stuck PGs . > > Hmm... > >> Version : Hammer 0.94.5. Do you think it can just be a reporting >> issue ? The osds that went down had around 300 PGs mapped to them out >> of which it was primary for 0 . We have two IDCs. All the primary are >> in one IDC only for now. >> >> Sage, I did not get yor first point. What is not matching ? In our >> cluster there were 2758 running osds before the 12 osd crash . After >> that event , there were 2746 osds left. >> >> PG Stat for your reference : v3733136: 75680 pgs: 75680 >> active+clean; 409 GB data, 16745 GB used, 14299 TB / 14315 TB avail; >> 443 B/s wr, 0 op/s >> >> Also, I have seen this issue 3 times with our cluster. Even when 1 osd >> goes down in IDC1 or IDC2, some PG always remain undersized. I am sure >> this issue will come up once again . So can you tell me what things >> should I look when an OSD goes down next time ? >> >> When I manually stop an osd daemon in IDC1 or IDC2 by running, >> /etc/init.d/ceph stop osd.x , the cluster recovers nicely. That is >> what making the issue so complex. > > Here you say restarting an OSD was enough to clear the problem, but > above you say you also removed the down OSDs from the CRUSH map. Can you > be clear about when the PGs stopped being undersized? I didn't restart the OSD as the disk was gone bad. When I removed from crush map, the PGs stopped being undersized. > > I can't tell from this information whether it is a reporting issue or > whether CRUSH to do a proper mapping on this cluster. > > If you can reproduce this, there are a few things to do: > > 1) Grab a copy of the OSD map: > > get osd getmap -o osdmap > > 2) Get a list of undersized pgs: > > ceph pg ls undersized > pgs.txt > > 3) query one of the undersized pgs: > > tail pgs.txt > ceph tell <one of the pgids> query > query.txt > > 4) Share the result with us > > ceph-post-file osdmap pgs.txt query.txt > > (or attach it to an email). Sure. I will do that the next time it occurs. Very grateful for your help, Gaurav > > Thanks! > sage > > > >> >> It is a big cluster and we have just started . We will need the >> community help to make it run smoothly and expand further :) >> >> Thanks >> Gaurav >> >> On Thu, May 19, 2016 at 1:17 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> > On Thu, 19 May 2016, Gaurav Bafna wrote: >> >> Hi Cephers , >> >> >> >> In our production cluster at Reliance Jio, when as osd goes corrupt >> >> and crashes, Cluster remains unhealthy even after 4 hours. >> >> >> >> cluster fac04d85-db48-4564-b821-deebda046261 >> >> health HEALTH_WARN >> >> 658 pgs degraded >> >> 658 pgs stuck degraded >> >> 688 pgs stuck unclean >> >> 658 pgs stuck undersized >> >> 658 pgs undersized >> > ^^^ this... >> > >> >> recovery 3064/1981308 objects degraded (0.155%) >> >> recovery 124/1981308 objects misplaced (0.006%) >> >> monmap e11: 11 mons at >> >> {dssmon2=10.140.208.224:6789/0,dssmon3=10.140.208.225:6789/0,dssmon31=10.135.38.141:6789/0,dssmon32=10.135.38.142:6789/0,dssmon33=10.135.38.143:6789/0,dssmon34=10.135.38.144:6789/0,dssmon35=10.135.38.145:6789/0,dssmon4=10.140.208.226:6789/0,dssmon5=10.140.208.227:6789/0,dssmon6=10.140.208.228:6789/0,dssmonleader1=10.140.208.223:6789/0} >> >> election epoch 792, quorum 0,1,2,3,4,5,6,7,8,9,10 >> >> dssmon31,dssmon32,dssmon33,dssmon34,dssmon35,dssmonleader1,dssmon2,dssmon3,dssmon4,dssmon5,dssmon6 >> >> osdmap e8778: 2774 osds: 2746 up, 2746 in; 30 remapped pgs >> > doesn't match this ^^ >> > >> > which makes it look like a problem with OSDs reporting PG state to the >> > mon. The fact that an OSD restarts supports that theory. >> > >> > What version is this? A bunch of the osd -> mon pg reporting code was >> > recently rewritten (between infernalis and jewel), so the new code is >> > hopefully more robust. (OTOH, it is also new, so we may have missed >> > something.) >> > >> > Nice big cluster! >> > >> > sage >> >> >> >> -- >> Gaurav Bafna >> 9540631400 >> >> -- Gaurav Bafna 9540631400 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html