replacing an OSD or crush map sensitivity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Could I have a critique of this approach please as to how I could have done it better or whether what I experienced simply reflects work still to be done.

This is with Ceph 0.61.2 on a quite slow test cluster (logs shared with OSDs, no separate journals, using CephFS).

I knocked the power cord out from a storage node taking down 4 of the hosted OSDs, all but one came back ok. This is one OSD out of a total of 12 so 1/12 of the storage.

Losing an OSD put the cluster into recovery, so all good. Next action was how to get the missing (downed) OSD back online.

The OSD was xfs based and so I had to throw away the xfs log to get it to mount. Having done this and getting it re-mounted Ceph then started throwing issue #4855 (I added dmesg and logs to that issue if it helps - I am wonder if throwing away the xfs log caused an internal OSD inconsistency? and this causes issue #4855?). Given that I could not "recover" this OSD as far as Ceph is concerned I decided to delete and rebuild it.

Several hours later, cluster was back to HEALTH_OK. I proceeded to remove and re-add the bad OSD. I following the doc suggestions to do this.

The problem is we each change, it caused a slight change in the crush map, resulting in the cluster going back into recovery, adding several hours wait for each change. I chose to wait until the cluster was back to HEALTH_OK before doing the next step. Overall it has taken a few days to finally get a single OSD back into the cluster.

At one point during recovery the full threshold was triggered on a single OSD causing the recovery to stop, doing "ceph pg set_full_ratio 0.98" did not help. I was not planning to add data to the cluster while doing recovery operations and did not understand the suggestion the PGs could be deleted to make space on a "full" OSD, so I expect raising the threshold was the best option but it had no (immediate) effect.

I am now back to having all 12 OSDs in and the hopefully final recovery under way while it re-balances the OSDs, although I note I am still getting the full OSD warning I am expecting this to disappear soon now that the 12th OSD is back online.

During this recovery the percentage degraded has been a little confusing. While the 12th OSD was offline the percentages were around 15-20% IIRC. But now I see the percentage is 35% and slowly dropping, not sure I understand the ratios and why so high with a single missing OSD.

A few documentation errors caused confusion too.

This page still contains errors in the steps to create a new OSD (manually):

http://eu.ceph.com/docs/wip-3060/cluster-ops/add-or-rm-osds/#adding-an-osd-manual

"ceph osd create {osd-num}" should be "ceph osd create"


and this:

http://eu.ceph.com/docs/wip-3060/cluster-ops/crush-map/#addosd

I had to put host= to get the command accepted.

Suggestions and questions:

1. Is there a way to get documentation pages fixed? or at least health-warnings on them: "This page badly needs updating since it is wrong/misleading"

2. We need a small set of definitive succinct recipes that provide steps to recover from common failures with a narrative around what to expect at each step (your cluster will be in recovery here...).

3. Some commands are throwing erroneous errors that are actually benign :ceph-osd -i 10 --mkfs --mkkey" complains about failures that are expected as the OSD is initially empty.

4. An easier way to capture the state of the cluster for analysis. I don't feel confident that when asked for "logs" that I am giving the most useful snippets or the complete story. It seems we need a tool that can gather all this in a neat bundle for later dissection or forensics.

5. Is there a more straightforward (faster) way getting an OSD back online. It almost seems like it is worth having a standby OSD ready to step in and assume duties (a hot spare?).

6. Is there a way to make the crush map less sensitive to changes during recovery operations? I would have liked to stall/slow recovery while I replaced the OSD then let it run at full speed.

Excuses:

I'd be happy to action suggestions but my current level of Ceph understanding is still too limited that effort on my part is unproductive; I am prodding the community to see if there is consensus on the need.




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux