Re: Questions about OSD recovery

Henry C Chang <henry.cy.chang@xxxxxxxxxxxxx> · Fri, 10 Feb 2012 16:26:22 +0800

在 2012年2月9日上午11:14，Josh Durgin <josh.durgin@xxxxxxxxxxxxx> 寫道：
> Detecting missing objects on startup is possible by looking at
> the pg log and comparing it to the objects on disk, but this can
> be a pretty expensive operation. The osd might also be out of

Yeah. It can be pretty expensive, but we only do it once on startup.
Also, since the osd has not yet joined the cluster, it shouldn't
affect the cluster
performance.

> date, so it's log might be useless (for example it could have
> divergent history that was not acked). It can't know how many
> current objects that should be there aren't until it goes through
> peering (to get an up to date and authoritative log) and
> recovery (to get missing data the logs say should be there). This
> is why scrub skips pgs that aren't active+clean. More details of
> peering can be found at http://ceph.newdream.net/docs/latest/dev/peering/.

Since peering only compare logs, I was thinking at least the osd should
check the existence of the objects the log claims to have. Then, we
would have the chance to recover the object before the pg goes active.

Also, I like the idea of storing crc/hash alongside the object as Tv said.
With that, we can even prevent the client from reading the corrupt data
by checking the crc/hash on each read. (Though, the read performance
will surely degrade.)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html