Re: replacing an OSD or crush map sensitivity

"Chen, Xiaoxi" <xiaoxi.chen@xxxxxxxxx> · Mon, 3 Jun 2013 23:16:55 +0000

my 0.02， you really dont need to wait for health_ok between your recovery steps,just go ahead. Everytime a new map be generated and broadcasted,the old map and in-progress recovery will be canceled

发自我的 iPhone

在 2013-6-2，11:30，"Nigel Williams" <nigel.d.williams@xxxxxxxxx> 写道：

> Could I have a critique of this approach please as to how I could have done it better or whether what I experienced simply reflects work still to be done.
> 
> This is with Ceph 0.61.2 on a quite slow test cluster (logs shared with OSDs, no separate journals, using CephFS).
> 
> I knocked the power cord out from a storage node taking down 4 of the hosted OSDs, all but one came back ok. This is one OSD out of a total of 12 so 1/12 of the storage.
> 
> Losing an OSD put the cluster into recovery, so all good. Next action was how to get the missing (downed) OSD back online.
> 
> The OSD was xfs based and so I had to throw away the xfs log to get it to mount. Having done this and getting it re-mounted Ceph then started throwing issue #4855 (I added dmesg and logs to that issue if it helps - I am wonder if throwing away the xfs log caused an internal OSD inconsistency? and this causes issue #4855?). Given that I could not "recover" this OSD as far as Ceph is concerned I decided to delete and rebuild it.
> 
> Several hours later, cluster was back to HEALTH_OK. I proceeded to remove and re-add the bad OSD. I following the doc suggestions to do this.
> 
> The problem is we each change, it caused a slight change in the crush map, resulting in the cluster going back into recovery, adding several hours wait for each change. I chose to wait until the cluster was back to HEALTH_OK before doing the next step. Overall it has taken a few days to finally get a single OSD back into the cluster.
> 
> At one point during recovery the full threshold was triggered on a single OSD causing the recovery to stop, doing "ceph pg set_full_ratio 0.98" did not help. I was not planning to add data to the cluster while doing recovery operations and did not understand the suggestion the PGs could be deleted to make space on a "full" OSD, so I expect raising the threshold was the best option but it had no (immediate) effect.
> 
> I am now back to having all 12 OSDs in and the hopefully final recovery under way while it re-balances the OSDs, although I note I am still getting the full OSD warning I am expecting this to disappear soon now that the 12th OSD is back online.
> 
> During this recovery the percentage degraded has been a little confusing. While the 12th OSD was offline the percentages were around 15-20% IIRC. But now I see the percentage is 35% and slowly dropping, not sure I understand the ratios and why so high with a single missing OSD.
> 
> A few documentation errors caused confusion too.
> 
> This page still contains errors in the steps to create a new OSD (manually):
> 
> http://eu.ceph.com/docs/wip-3060/cluster-ops/add-or-rm-osds/#adding-an-osd-manual
> 
> "ceph osd create {osd-num}" should be "ceph osd create"
> 
> 
> and this:
> 
> http://eu.ceph.com/docs/wip-3060/cluster-ops/crush-map/#addosd
> 
> I had to put host= to get the command accepted.
> 
> Suggestions and questions:
> 
> 1. Is there a way to get documentation pages fixed? or at least health-warnings on them: "This page badly needs updating since it is wrong/misleading"
> 
> 2. We need a small set of definitive succinct recipes that provide steps to recover from common failures with a narrative around what to expect at each step (your cluster will be in recovery here...).
> 
> 3. Some commands are throwing erroneous errors that are actually benign :ceph-osd -i 10 --mkfs --mkkey" complains about failures that are expected as the OSD is initially empty.
> 
> 4. An easier way to capture the state of the cluster for analysis. I don't feel confident that when asked for "logs" that I am giving the most useful snippets or the complete story. It seems we need a tool that can gather all this in a neat bundle for later dissection or forensics.
> 
> 5. Is there a more straightforward (faster) way getting an OSD back online. It almost seems like it is worth having a standby OSD ready to step in and assume duties (a hot spare?).
> 
> 6. Is there a way to make the crush map less sensitive to changes during recovery operations? I would have liked to stall/slow recovery while I replaced the OSD then let it run at full speed.
> 
> Excuses:
> 
> I'd be happy to action suggestions but my current level of Ceph understanding is still too limited that effort on my part is unproductive; I am prodding the community to see if there is consensus on the need.
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com