On Wed, Dec 21, 2011 at 7:47 PM, <Eric_YH_Chen@xxxxxxxxxxx> wrote: > Hi, All > > When I type 'ceph health' to get the status of cluster, it will show > some information. > > Would you please to explain the term? > > Ex: HEALTH_WARN 3/54 degraded (5.556%) > > What does "degraded" mean ? Is it a serious error and how to > fix it ? > > Ex: HEALTH_WARN 264 pgs degraded, 6/60 degraded (10.000%); 3/27 > unfound (11.111%) There are two meanings of degraded here. The degraded PGs are those which don't yet have the number of active OSDs as they should (ie, the PG wants 3 OSDs to be holding it and only 2 are). The number of degraded objects is the number of missing replicas of objects. The difference here is that an OSD can be an active member of a PG without holding all the objects yet; the general sequence is that you lose an OSD so a bunch of PGs go degraded, and then the OSDs peer and bring in a new replica so the PG is no longer degraded but most of the objects are until they get copied over. Unfound objects are those which the cluster believes should exist but can't find anywhere, either because the only copy is on a down OSD or because there's a bug which caused them to believe in non-existent objects. Are you using the RADOS gateway? If you are, that's probably where your unfound objects came from; there was a long-standing accounting bug which had a fix merged earlier this week. > What does "unfound" mean? Could we recover the data? > Would it cause the whole data in rbd image corrupted and never > access ? Nope; unfound objects will only block access to that specific object. I'll have to look into whether rbd could trigger the same bug that RGW was or not. > > When I type 'ceph pg dump', it would show like this. Would you > please explain what is "hb in" and "hb out" ? Those are the lists of OSDs which are heartbeating the given OSD, in and out. The first group is OSDs which the one in question is keeping track of; the second are OSDs which the one in question should be reporting to. > And from the latest document, I know we can do the cluster snapshot by > " ceph osd cluster_snap <name>" > Is that means we can rollback the data from the snapshot? Do you have > any related document to show how to operate it? That's the intention, but it's not a well-tested or complete solution at this time. You shouldn't use it yet. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html