Re: some thoughts about scrub

ClÃudio Martins <ctpm@xxxxxxxxxx> · Tue, 1 Feb 2011 16:29:37 +0000

On Tue, 1 Feb 2011 17:20:34 +0800 Henry Chang <henry.cy.chang@xxxxxxxxx> wrote:
> 
> Yeah. I expect that scrub can both detect disk errors and check data
> integrity (based on the checksum) in the background. For disk errors,
> I would like CEPH to mark the OSD down/failed and notify the sys
> admin immediately. For data errors, I expect that CEPH can repair
> them automatically (by fetching a right copy from other replicas).
> 

 I suppose the best approach would be for this to be configurable with
per OSD granularity. Something like an io_error_threshold config
variable. I would set it to something like 50 or 100, but you could set
it to 1 and the OSD would put itself down or out after that many IO
errors that propagated up to the osd daemon. I guess that even if that
OSD becomes unresponsive for a while it won't be much trouble, since
ceph will mark it down and should recover later, or else the OSD will
be out soon by itself due to the error threshold.

 What do you think?

Cheers

ClÃudio

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html