Re: RAID5 recoverability - Was: Interesting article

Matti Aarnio <matti.aarnio@xxxxxxxxxxx> · Thu, 15 Jan 2009 13:51:08 +0200

On Wed, Jan 14, 2009 at 01:02:38PM -0700, Maurice Hilarius wrote:
> 
> I read this today:
> http://blogs.zdnet.com/storage/?p=162
> 
> Would anyone who knows enough about this care to comment?
> 
> Thanks in advance for any thoughts..

Simplistic recovery strategy is indeed to fault entire disk upon read-fail,
and then sync everything from other disks to it.   Linux does this at least
on RAID-1, my RAID-5 systems are controllers with internal software to handle
the recovery.

Smarter approach is to use RAID5(or 1) recovery from other disks on given
block, _and_ write the failed block immediately.  It is surprising how often
this makes the problem to go away!

Disk would hard-fault and drop out of array only when that fixup write, or
subsequent verifying read fails.

Even such soft-fault should raise alarm -- "Disk 5 has soft-faulted 200
times in past hour."

(Somebody has patented all of this, no doubt...)

Enhancing Linux MD to do things like I outlined above would be beneficial.

Adding periodic self-activated low-IO-priority read-scanner on array logics
would also do a world of good on array reliabilities.  But then some people
want to keep their arrays sleeping, and start them only occasionally.
Perhaps it would be better to have explicite  mdadm  option to do such scan,
and recommend adding it to crontab.

> -- 
> With our best regards,
> 
> //Maurice W. Hilarius         Telephone: 01-780-456-9771/
> /Hard Data Ltd.               FAX:       01-780-456-9772/
> /11060 - 166 Avenue           email:maurice@xxxxxxxxxxxx/
> /Edmonton, AB, Canada/
> /     T5X 1Y3/

  /Matti Aarnio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html