my 0.02, you really dont need to wait for health_ok between your recovery steps,just go ahead. Everytime a new map be generated and broadcasted,the old map and in-progress recovery will be canceled 发自我的 iPhone 在 2013-6-2,11:30,"Nigel Williams" <nigel.d.williams@xxxxxxxxx> 写道: > Could I have a critique of this approach please as to how I could have done it better or whether what I experienced simply reflects work still to be done. > > This is with Ceph 0.61.2 on a quite slow test cluster (logs shared with OSDs, no separate journals, using CephFS). > > I knocked the power cord out from a storage node taking down 4 of the hosted OSDs, all but one came back ok. This is one OSD out of a total of 12 so 1/12 of the storage. > > Losing an OSD put the cluster into recovery, so all good. Next action was how to get the missing (downed) OSD back online. > > The OSD was xfs based and so I had to throw away the xfs log to get it to mount. Having done this and getting it re-mounted Ceph then started throwing issue #4855 (I added dmesg and logs to that issue if it helps - I am wonder if throwing away the xfs log caused an internal OSD inconsistency? and this causes issue #4855?). Given that I could not "recover" this OSD as far as Ceph is concerned I decided to delete and rebuild it. > > Several hours later, cluster was back to HEALTH_OK. I proceeded to remove and re-add the bad OSD. I following the doc suggestions to do this. > > The problem is we each change, it caused a slight change in the crush map, resulting in the cluster going back into recovery, adding several hours wait for each change. I chose to wait until the cluster was back to HEALTH_OK before doing the next step. Overall it has taken a few days to finally get a single OSD back into the cluster. > > At one point during recovery the full threshold was triggered on a single OSD causing the recovery to stop, doing "ceph pg set_full_ratio 0.98" did not help. I was not planning to add data to the cluster while doing recovery operations and did not understand the suggestion the PGs could be deleted to make space on a "full" OSD, so I expect raising the threshold was the best option but it had no (immediate) effect. > > I am now back to having all 12 OSDs in and the hopefully final recovery under way while it re-balances the OSDs, although I note I am still getting the full OSD warning I am expecting this to disappear soon now that the 12th OSD is back online. > > During this recovery the percentage degraded has been a little confusing. While the 12th OSD was offline the percentages were around 15-20% IIRC. But now I see the percentage is 35% and slowly dropping, not sure I understand the ratios and why so high with a single missing OSD. > > A few documentation errors caused confusion too. > > This page still contains errors in the steps to create a new OSD (manually): > > http://eu.ceph.com/docs/wip-3060/cluster-ops/add-or-rm-osds/#adding-an-osd-manual > > "ceph osd create {osd-num}" should be "ceph osd create" > > > and this: > > http://eu.ceph.com/docs/wip-3060/cluster-ops/crush-map/#addosd > > I had to put host= to get the command accepted. > > Suggestions and questions: > > 1. Is there a way to get documentation pages fixed? or at least health-warnings on them: "This page badly needs updating since it is wrong/misleading" > > 2. We need a small set of definitive succinct recipes that provide steps to recover from common failures with a narrative around what to expect at each step (your cluster will be in recovery here...). > > 3. Some commands are throwing erroneous errors that are actually benign :ceph-osd -i 10 --mkfs --mkkey" complains about failures that are expected as the OSD is initially empty. > > 4. An easier way to capture the state of the cluster for analysis. I don't feel confident that when asked for "logs" that I am giving the most useful snippets or the complete story. It seems we need a tool that can gather all this in a neat bundle for later dissection or forensics. > > 5. Is there a more straightforward (faster) way getting an OSD back online. It almost seems like it is worth having a standby OSD ready to step in and assume duties (a hot spare?). > > 6. Is there a way to make the crush map less sensitive to changes during recovery operations? I would have liked to stall/slow recovery while I replaced the OSD then let it run at full speed. > > Excuses: > > I'd be happy to action suggestions but my current level of Ceph understanding is still too limited that effort on my part is unproductive; I am prodding the community to see if there is consensus on the need. > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com